(About programs malloc'ing memory, then not using it)
How is that ever "reasonable"? If you said "rarely", ok, that's why swapping out pages makes sense. But not use it at all? Why should you ever reserve memory if you'll never write to it? I'd call that ill program design...
You are right, it's not very common. But it happens. Example: I create a vector that is supposed to hold up to 1 million entries (because that's a reasonable upper limit for how many things my program has to deal with). This run, there are only 10K entries. Maybe I'm using a data structure that's deliberately sparse for great insert/remove performance, and I just don't care that I'm wasting a few dozen megabytes, because memory is cheap. It particularly happens with server code that does internal caching with good cache management: If the workload the server has to handle doesn't need much caching (great locality), some memory will go unused. That's fine, the malloc() calls were nearly free, and don't use many resources. That's exactly the idea behind overcommit.
All the countless reasons your program could crash aside (you can eliminate intrinsic reasons in theory, but not environmental reasons): Running out of memory is a condition that allows at least a "graceful" exit, if your program would learn about it the moment it tries to reserve memory.
In practice, recovering form malloc failure (even if such a thing commonly happened) is harder than it seems. There are several reasons. One is that the "graceful exit" code probably needs memory allocation too. One technique I've seen used is to send all such emergency exit code through one common routine. And then at startup time, reserve one memory buffer (maybe 1MB) that is never ever used, and the first thing the emergency exit code does is to free that memory buffer, so the exit code can function with a few mallocs. But even that fails today. The reason is that most big modern programs are multi-threaded (and have to be, to take advantage of multi-core CPUs and to overlap network and IO latencies). So one thread runs out of memory, and longjmp's to the common exit routine. But that exit routine can not synchronously stop all other threads from malloc'ing (since any synchronous locking mechanism would be too slow), and even if it free's an emergency reserve pool, that will be immediately consumed by the other threads. I've tried writing such "out of memory recovery" code, and after weeks of messing with it, gave up. You're better off looking at the problem you're trying to solve, estimate how much memory is available (you know the machine, you know what other software is running), and planning accordingly. And if (god forbid) someone runs a giant memory hog (like emacs'ing a 1GB log file) on the machine, it's game over. Doctor, it hurts when I do that. Well, then stop doing it.
The bad thing about that practice is: Once the system learns it can't map all the currently needed pages to physical RAM any more, the only resort is the OOM killer, randomly killing some large process (so, any process in the system can be affected).
Or your program catches a segfault. Just as painful and unpleasant.
A broken program just reserving insane amounts of memory will be able to bring down other processes on the same machine. That's something virtual memory was originally designed to avoid.
There are many things that traditional operating systems were designed to avoid. For example isolation between users ... and we've given up on that, we instead move competing users into VMs, containers, or jails. I mean, we even do things like running a simple and harmless piece of software (the DNS server) in a jail, just "because". I think what you're saying is that OS design has not reached its goal. I agree, but that's the world we live in. Writing reliable and performant software in an imperfect world can require gritting your teeth and accepting reality.
about stack space, I don't really see a problem with that. As long as you use neither VLAs, stuff like alloca()
or recursion, you can guarantee an upper bound for stack usage of your program (and any algorithm can be implemented without these).
Agree. With good coding practices, running out of stack space should be rare. You just have to make sure all programmers on the project understand that.
(About shkhln's comment: "if some process consumes an amount of memory you failed to predict, this is already a problem.")
This comment right here trumps the whole thread.
Sadly true. If you want to build reliable production systems, look at all the processes on the machine.