PostgreSQL and the OOM Killer: Why We Use Strict Memory Overcommit

(ubicloud.com)

70 points | by furkansahin 2 hours ago

9 comments

ozgune 1 hour ago
(Ozgun from Ubicloud)
I agree with the blog post's technical contents, but I feel we came across too strong in the title. For Ubicloud as a managed Postgres provider, we use strict memory overcommit. Our experience with operating Postgres at scale taught us that it's better to enable this than going with the defaults.
However, I can see many other scenarios, where using strict memory overcommit would have unanticipated side-effects. That's why Linux doesn't go with strict memory commit as its default.
[-]
- furkansahin 1 hour ago
  (Furkan, submitter) Hmm, I haven’t thought about that. I updated the title to better reflect Ubicloud Postgres' position.
leononame 1 hour ago
This has bitten me multiple times. The problem I have is that at work we deploy the application (written in Go) and PostgreSQL on the same machine. The backend app allocates a lot of virtual memory, and initially we had overcommit to 0 (heuristic). This caused crashes on big queries in PostgreSQL and we set it to 2. The whole system became a bit unstable because the backend would still allocate a lot of virtual memory and at some point we ran into errors when allocating.
For now, we have overcommit_ratio set to a value that is stable from experience, but there really seems to be no silver lining. Go is very happy to allocate a lot of virtual memory, but so are most managed languages. The best solution would probably be to host the backend and the database on separate servers.
[-]
- xyzzy_plugh 56 minutes ago
  I'm not sure if you are aware but there are relatively recent environment variables you can set to help contain Go memory to a fixed size.
  GOMEMLIMIT works very well if you set it to around 90% of available memory as a rough heuristic. You should definitely profile your application to fine tune this number (e.g. if you link with C libraries that hold large memory pools then Go doesn't account for that) but also to identify sources of spikey/leaky allocations. For example, encoding/json is notorious for it's inner sync.Pool hanging on to outsized buffers. There's usually a lot of low hanging fruit.
  In my experience Go can be extremely stable in terms of memory footprint at both small (~O(1MiB)) and large (~O(256GiB)) scales, and it takes only a small amount of effort.
  As far as GC languages go, it is by far the easiest to work with.
- hilariously 1 hour ago
  Yes, it would. Basically every serious database tries to allocate everything and more - back in the day we'd just allocate VMs on the machine even with the overhead because knowing it cannot leave its constraints and would work within them was worth the cost.
  [-]
  - guenthert 1 hour ago
    There are many reasons to use a dedicated host (or VM) for a DB server, but if only the accessible memory needs to be limited a container is the simpler, more efficient tool. Said that, I would expect to be able to configure how much memory a DB process is allowed to allocate. I remember distinctly that PostgreSQL allows such. But of course both can be configured simultaneously, a belts&suspenders approach if you will.
    Whether failed transactions are actually so much more desirable than a OOM-killed process isn't quite obvious, but it might be easier to troubleshoot.
Bender 2 hours ago
They allude to this in the article but I would emphasize caution when using mode 2 especially if one has already adjusted overcommit ratios as one can prevent forks. Test this in a QA/Perf environment first, also testing the restart of all applications. Load test and do full QA tests before deploying to Production and even then when deploying to production I would just dynamically change the setting via app deployment scripts until confidence is high instead of putting it in the sysctl config files.
I've gone through this exercise in the past on much older kernels which they cover as well and just me personally I ran into less issues by leaving overcommit to 0 and just dropping the overcommit ratio to 0 and setting the oom_score_adj for programs as high as 1000 if I wanted vmscan to leave them alone and of course using the Redhat formulas for setting vm.min_free_kbytes, vm.admin_reserve_kbytes, vm.user_reserve_kbytes. And of course be vigilant in disallowing app owners from using every last bit of memory.
[-]
- Bender 13 minutes ago
  Correcting a rather significant typo: setting the oom_score_adj for programs as high as 1000 should be -1000 to be left alone. 1000 would make it a prime candidate for an OOM kill. Positive integers should be used on sacrificial superfluous programs. [1] As an example OpenSSH sets the sshd to -1000 by default.
  [1] - https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...
chiply314 28 minutes ago
Nothing worse than memory management on Hyperscaler VMs which do not use Swap :|
Took k8s ages to get Swap support.
We lost something when we accepted that Hyperscalers just tell you to use more moemory. It was shitty 5 years ago and today especially after the ram price increases
adamors 1 hour ago
I read this article about 3 weeks ago when this bit me. Really great write-up, some tricky details.
otterley 1 hour ago
I think this is also a good lesson on why it's best to isolate mission-critical services like databases on their own compute nodes.
szmarczak 2 hours ago
I have disabled overcommit both on Windows and on Linux. I hate having random programs being killed.
Unfortunately, many programs commit 2x memory than they actually use. Often I see ~32GB committed and ~16GB resident.
[-]
- sterwill 1 hour ago
  Does this result in programs more frequently erroring/crashing because they can't allocate? I don't know how well many of the programs I frequently use on my desktop (Firefox, GNOME desktop, JVM + IntelliJ, Slack, etc.) handle allocation failures. I'm not sure they would do much better than crash, but I know the default OOM killer settings work well for me. About once a year a real runaway process (usually a throwaway program I'm working on) gets OOM-killed, and that's fine with me.
- nok22kon 23 minutes ago
  how exactly did you disabled it on Windows?
  I dont think it has an option for that.
swordlucky666 1 hour ago
[dead]