Use your Nvidia GPU's VRAM as swap space on Linux

(github.com)

168 points | by tanelpoder 4 hours ago

21 comments

  • yjftsjthsd-h 3 hours ago
    > Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work.

    Well, that does at least answer my immediate question about why I would ever swap from expensive RAM to really expensive RAM:) Feels niche, but when you want it it's a good idea.

    • Wowfunhappy 2 hours ago
      Another possible reason that occurred to me: what if you have VRAM but you're not using it all the time? For example, let's say you bought a GPU because you like to play video games. When you're not actively gaming, you probably don't need 16 GB of VRAM just to render the desktop. Might as well use it for something else, right?

      Edit: Although, this is predicated on the system being able to release VRAM that is acting as swap when it's time to start a game. Can it do that?

      • Saris 1 hour ago
        It's easy enough to 'offline' swap space on Linux normally so I suspect that would work fine, as long as you didn't instantly run out of RAM when doing so.
  • RachelF 2 hours ago
    Nice idea, but something has gone very wrong here:

    >Sequential throughput: ~1.3 GB/s

    [on a RTX 3070 Laptop]

    This RTX 3070 chip is on PCIe 4.0 x16 which should give 64GB/s. The 8GB of GDDR6 is 448GB/s.

    Swapping to an NVMe drive would be twice as fast, but with higher latency.

    • Teknoman117 57 minutes ago
      Gen 4.0 x16 is 32 GB/s in each direction, but the way this is implemented is not the way you'd go about this if you wanted high performance.

      Edit: Their benchmarks are also run using ZRAM, which compresses pages before writing to swap. Not sure what the performance overhead of that is, but it's probably quite a bit.

      First of all, it's a userspace program hooking the nbd driver, which is known for being slow. It also uses a bounce buffer in userspace before transferring to the GPU. So when the kernel needs to swap a page, it has to first copy it into a userspace facing buffer. The userspace program that has to wake back up and issue the cuda operation to copy the page into device memory.

      nbd also doesn't really do a good job of supporting high queue depth or merging adjacent accesses. So if the kernel is issuing a bunch of 4K page swaps without any coalescing, you're going to end up with at least million kernel/userspace context switches per second just to handle 4 GB/s (4 GB / 4K page), let alone 64 GB/s. And that's just the NBD portion, forget the mess that is the NVIDIA driver. PCIe can move a lot of data, but in order to get anything even resembling the full bandwidth, you have to have use DMA engines with long page lists. Having to set up a transfer for every 4K page over PCIe will not reach full saturation of the bus.

      Swapping to NVMe is a very optimized path -> the swapper can submit lists of pages directly to the NVMe driver and the controller can DMA them directly out of RAM, no copies or context switches CPU side at all.

      This could probably be improved by migrating to the ublk driver as it might let you avoid the userspace bounce buffer. It'd also be able to have multiple write queues to at least set up CUDA copies in parallel.

  • xfalcox 3 hours ago
    Given my dev machine has 32GB of RAM and 32GB of VRAM that sits mostly idle when I'm not running AI models, this is not that bad of an idea.
    • mathisfun123 18 minutes ago
      this is the pcmasterrace equivalent of being all upper body and with scrawny legs lol
      • tempoponet 1 minute ago
        It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now.
  • drdaeman 2 hours ago
    What about backpressure, how does it handle requirements for VRAM allocation when VRAM is used for swap space?

    With X11 it's not that bad (buffers are pre-allocated), but with Wayland allocations are a lot more dynamic, so running low on VRAM can easily crash the whole desktop. I just had a few of such crashes with Hyprland+llama-server+KVM switching between computers without freeing VRAM.

  • dragontamer 3 hours ago
    Remember how 16GBs used to be an enterprise level database mainframe?

    Well, GPUs also have stupid amounts of compute on them. I have to imagine that there is some kind of database format that's useful with GPU compute attached.

    Since the data is already in VRAM, the GPU can sort, join, or otherwise manipulate data as needed.

    • tmostak 2 hours ago
      GPU-accelerated databases have a long history. I founded HeavyAI (previously MapD/OmniSci) in 2013, but there are or have been many other startups in this space, such as Voltron Data, Kinetica, Sqream, etc. And now you have major players like IBM, Starburst, and Microsoft (which just announced Fabric SQL on GPU today) working on their own GPU-accelerated systems. GPUs have a huge advantage in terms of compute, memory, and interconnect bandwidth over CPU, as long as you can keep them fed with data.

      I believe within 2-3 years databases and data warehouses on GPU will be common. The widespread use of agents to query data will be a part of this, as there will be a need to run far more queries at lower latency than needed for the ETL and BI workloads of the past.

    • einichi 2 hours ago
      oh god please don't create more demand for GPUs
    • giancarlostoro 2 hours ago
      Can we somehow make them work with 1 TB PCIes so we can churn through way more data?
    • Nate75Sanders 2 hours ago
      Possibly LSM compaction.
  • willis936 3 hours ago
    I'm more interested in the opposite. Nvidia linux drivers crash when you try to address more VRAM than you have. It'd be nice if they didn't.
    • SV_BubbleTime 2 hours ago
      They already do that on windows and it kinda sucks. If you are targeting something like LMStudio or ComfyUI, both of those have superior methods to do exactly this.
  • mmastrac 1 hour ago
    I seriously looked at this as a way to improve the RAM situation in a QNAP 2U unit that I was having trouble sourcing RAM for. It's somewhat annoying that legit memory-over-PCIe is gated on PCIe5 and chipset support.

    In the end I just had to bite the bullet and take a gamble on finding ECC DDR4 RAM that would work with the ancient AMD chipset...

    This particular implementation seems to be running over too many layers to be particularly performant. Why not a custom block driver instead?

    • Teknoman117 1 hour ago
      Memory on an expansion card isn't gated on PCIe 5, it's gated on CXL support. CXL and PCIe use the same electrical/physical layer but the protocol is very different.

      The problem with putting (system) RAM on a PCIe card is that PCIe is not a cache-coherent interconnect. If you have a cache line that resides on your GPU sitting inside your processor's cache a remote modification to that memory by either the GPU, another CPU core or some other PCIe device with NOT invalidate the CPU cache line. You also have the fun situation that if it's modified on both ends simultaneously the resulting state will be non-deterministic.

      Device drivers have to be very careful about synchronization when accessing memory-like areas on PCIe. CXL adds a cache coherency protocol among other things, so that invalidations and snoops can be exchanged over the interconnect.

      • wang_li 13 minutes ago
        It’s deterministic. But as the user you don’t know enough to know what was determined.
  • dlt713705 2 hours ago
    Does anyone these days really use swap for anything than S4 suspend ?
    • kccqzy 2 hours ago
      https://news.ycombinator.com/item?id=40697318

      This HN comment and the linked post brought up a lot of good points. The main takeaway is that swap should primarily be considered a mechanism for equality of reclamation, not for emergency extra memory, where equality of reclamation means file-backed pages and anonymous pages are subject to similar criteria for being evicted from physical memory.

      I used to have zero swap on my Linux desktop and this convinced me to add at least a small swap partition.

      • sidewndr46 1 hour ago
        I just set swappiness to zero years ago and never looked back.
        • kccqzy 36 minutes ago
          That’s like the complete opposite advice. Chris said the lowest recommended swappiness is 1. I have it set to 100.
    • Saris 1 hour ago
      It's useful on lower RAM systems as the least frequently used memory can be moved to swap, freeing up more RAM for stuff that needs it. Even when using zram it works out pretty well on my laptop with 8GB of RAM, it'll often have 4GB+ in zram swap space compressed down to only 1GB or so of physical RAM usage.
    • yjftsjthsd-h 2 hours ago
      It really depends on what you run and how much RAM you have to do it in. I run some machines into swap just by running a couple browsers and some containers in the background on a 16GB laptop. I've also run a single light browser and essentially nothing else on 4GB and been fine:)
  • sgjohnson 1 hour ago
    >Sequential throughput: ~1.3 GB/s

    sounds VERY low, also, wouldn't random read/write speed be MUCH more relevant here?

  • jcmfernandes 3 hours ago
    Q: Why? A: Why not?
  • hardwaresofton 3 hours ago
    You want to waste VRAM, in this economy?
  • UnfitFootprint 2 hours ago
    No software benchmarks? BAR for RAM is cool but I want to see how much it _actually_ beats pcie nvme
  • LouisvilleGeek 2 hours ago
    Finally a use for the expensive ram when it's not needed in workloads!

    Now if it could be dynamically used and vacated on other GPU workloads?

  • bobsmooth 2 hours ago
    RAM disks have always fascinated me. In a different timeline every PC has a 100gb of RAM and 50TB HDDs are the norm.
    • pixl97 2 hours ago
      Back when HDDs were all there was ramdisks were interesting, but SSDs pretty much killed most of that as they have massively increased IOPS over disks.

      Hard drives that huge scare me as it would take days to backup all the data off them.

      • bobsmooth 1 hour ago
        In my fantasy RAM was the predominate technology over flash.
  • lowbloodsugar 1 hour ago
    This is why I read HN.
  • nialv7 1 hour ago
    I mean, you prompted something useful out of an AI, good job. But then use that to ask for donation? Feels weird, man.
  • effnorwood 2 hours ago
    use your car for an anchor on a big boat!
    • SV_BubbleTime 2 hours ago
      I mean, if you aren’t using the car while using the boat and it won’t really damage the car… yes?
  • usxr1515 2 hours ago
    Nice
  • simonask 3 hours ago
    I mean, cool, but I’d rather not?
    • margalabargala 3 hours ago
      So don't. Not everything is for you.
      • TurkTurkleton 1 hour ago
        Didn't you hear? The author of this daemon is going around and forcibly installing it on anyone's computer that has soldered memory and an Nvidia GPU. I heard even he brings a Ludovico-technique chair with him and straps you in and pins your eyelids open like A Clockwork Orange so you have to watch.
    • gchamonlive 3 hours ago
      Wouldn't it be faster to swap to vram if you are sitting there with 8gigs of it unused than swapping to ssd and burning its write cycles, assuming you absolutely need swap
    • dspillett 3 hours ago
      So, erm, don't?
  • Sohcahtoa82 3 hours ago
    [dead]