LINUX TUTORIAL The OOM killer: why the kernel kills your process
Why Linux runs out of memory and kills a process, how it picks which one (oom_score), how to trigger an OOM kill safely, and how to read it in the kernel log.
What we're doing
We look at the memory and overcommit settings, read and nudge a process's oom_score, then trigger a real OOM kill inside a memory-limited box (so it stays contained) and read the evidence in dmesg. This VM has ~3.9 GB RAM, no swap, and a staged cache-warmer holding ~600 MB.
Watch the video first, then run these as we read. Reading commands need no sudo; we use sudo to write a root process's oom_score_adj, to create the memory-limited scope, and to read the kernel log.
What the OOM killer is
Linux hands out more memory than it physically has (overcommit), betting most is never really used. Usually fine. When real use fills RAM with no swap and nothing to reclaim, the kernel cannot keep that promise, so the OOM killer (Out Of Memory) picks one process and kills it with SIGKILL to save the rest. It picks by oom_score (mostly: how much memory the process uses), highest score first.
Step 1: the memory situation
free -h # memory overview (human-readable)
cat /proc/sys/vm/overcommit_memory # overcommit mode (0 = heuristic default)
Mem: 3.8Gi 1.4Gi 1.9Gi ...
Swap: 0B 0B 0B
0
Swap: 0B = no overflow space, so OOM fires when RAM fills. Mode 0 = the kernel allows reasonable overcommit (which is why OOM has to exist).
Step 2: read and nudge oom_score
cat /proc/$(pgrep cache-warmer)/oom_score # current kill-score (higher = killed sooner)
cat /proc/$(pgrep cache-warmer)/oom_score_adj # the knob: -1000..1000, default 0
echo 1000 | sudo tee /proc/$(pgrep cache-warmer)/oom_score_adj # volunteer it as first victim
cat /proc/$(pgrep cache-warmer)/oom_score # jumps to ~1000
142
0
1000
1000
oom_score is driven mostly by memory use (cache-warmer holds 600 MB, so it scores high). oom_score_adj nudges it: +1000 = pick me first, -1000 = nearly untouchable (used to protect sshd, databases).
Step 3: trigger an OOM kill, safely
# run the allocator in a 200 MB cgroup so the kill stays contained (does not take down the VM)
sudo systemd-run --scope -p MemoryMax=200M stress-ng --vm 1 --vm-bytes 400M --vm-keep --timeout 30s
Running scope as unit: run-r9f3c.scope
stress-ng: error: [6041] vm instance 0 was killed by signal 9 (SIGKILL)
400 MB will not fit in a 200 MB cap with no swap, so the kernel OOM-kills the worker inside the scope. It dies by SIGKILL (uncatchable, no cleanup), which is why OOM-killed processes just vanish.
Step 4: read the evidence
sudo dmesg -T | grep -iE "out of memory|killed process" | tail # the kernel's record of the kill
... Memory cgroup out of memory: Killed process 6042 (stress-ng-vm) total-vm:412900kB, anon-rss:198400kB, oom_score_adj:0
The kernel logs the victim, its memory (anon-rss ~200 MB), and its oom_score_adj. It says Memory cgroup out of memory because we contained it; a whole-system OOM says Out of memory: Killed process .... This line is what we look for when a process disappears for no clear reason.
Cheat sheet
free -h # is there swap? how full is RAM?
cat /proc/sys/vm/overcommit_memory # overcommit mode (0 default)
cat /proc/PID/oom_score # kill-score (higher = killed first)
cat /proc/PID/oom_score_adj # the knob (-1000..1000)
echo -1000 | sudo tee /proc/PID/oom_score_adj # protect a process
sudo systemd-run --scope -p MemoryMax=200M CMD # run CMD in a memory-capped box
sudo dmesg -T | grep -iE "out of memory|killed process" # read the kill
In short: Linux overcommits memory, so when real use fills RAM with no swap, the OOM killer SIGKILLs the process with the highest oom_score (usually the biggest memory user) to save the system. Read the kill in dmesg, protect critical processes with a negative oom_score_adj, and the real fix is memory limits, swap, and fixing the leak.
What's next
Start LINUX