another man's ramblings on code and tech

Why Memory Consumption is a Sham and you know nothing, Jon Snow

OK, quick question. How much memory is your computer using right now? After being asked this, most simply open up their favourite performance monitor of choice and rattle off the biggest number they can find on the screen. As a performance developer I’m asked that question constantly, and was once forced (by my last boss, I might add) to define it. I answered "data loaded into memory". I was then asked how I would determine it in practical terms. I mentioned a quick check of the “memory used stat” of whatever tool I had present at the time. However, this answer would not be correct, as I learned, because it didn’t ask what one meant by “used”, and didn’t predicate its answer with what was being “used”. This is because memory is incredibly complex beyond the grandiose abstractions most performance monitors provide for it. To answer what “used memory” even means is a challenging task, even more so when one is asked to measure it on their own.

Before diving into answering what "used memory" could mean, let’s do a quick refresher. Most operating systems these days run on a memory model involving five major groupings:

  • Cached memory

  • Virtual memory (or swap space)

  • Shared memory

  • Used memory

  • Free memory

Everything relating to memory management of your OS and processes falls under working with one or many of these five groupings. Of course, these are made up of much more complex building blocks, and I’m waving my hands over a ton of additional functionality which I will completely ignore here, but for now let’s consider just these categories.

Cached Memory

We’ll start with cached memory, which represents data moved from hard drive space to physical RAM. Essentially, when a program is first run from disk, it must be loaded into memory with a read operation. This is relatively expensive, so the program is left in memory if there’s enough free space to warrant it. This allows the next read operation to skip the disk and get what it needs directly from physical RAM. The amount of cached memory would be equal to the amount this data left in RAM. When physical memory starts to become constrained, the OS will start dropping the lowest priority sections of this cached memory in favour of higher priority processes’ needs. "Priority", here, again has a wibbly wobbly definition, which includes how frequently the cached memory is used and various, various other factors. A question immediately pops it’s head; should cached memory contribute to used memory? It is in physical RAM, but it’s not essential to be in physical RAM. Many choose to ignore it from this count, but it is an important consideration.

Virtual Memory

Virtual memory (or swap space in Linux) is a section of hard drive space specifically allocated for usage by RAM. Many beginners are totally confused by this. What’s the point of having RAM chips if you’re just going to use disk space anyhow? Well, as physical RAM fills up, the OS must find a way to create space. To do this it moves the lowest priority sections (again, this is a wibbly wobbly decision) out of physical memory to disk storage. Or it swaps the memory out, hence the name swap space. This allows the OS to balance many high and low priority processes simultaneously without toppling. Of course, virtual memory makes our definition of "used memory" more complex. Should it include swap? Should it ignore swap? If a program has objectively used 2 GB of memory, but 1.5 GB is weighted in swap (but might be needed any time now), how do we count it then? The principal usually followed is to count only memory on physical RAM as counting towards “used”, no matter what this ratio is.

Shared Memory

Shared memory is memory provided by a single source and made accessible to many other authorised processes. An example of this is shared libraries, which have a single definition loaded into memory which is given access to various other programs requesting it. This type of memory adds another large factor to the complexity of our answer to defining "used memory". Do we include the amount of memory being used on physical RAM? Or the entirety of these shared libraries across virtual and physical memory? How do we factor in the number of competing programs requesting this memory? I usually just count shared memory as one giant block, but understanding how it’s being used and accessed is important in understanding performance.

Used Memory

So, after looking at these three locations, what does "used" memory even mean then? Well, Linux, like a sage old master, gets around the above craziness by answering our question with three different values in the “top” command for each process:

  • VIRT, or "virtual", which represents the sum of physical and virtual memory used

  • RES, or "resident size", is meant to represent the used memory only on physical RAM

  • SHR, or "share", which is meant to be the amount of shared memory this process is using

With these three stats, one can get an idea of what the process is doing in terms of memory. However, remember that VIRT will always include the entirety of memory loaded, regardless of whether it’s actually being used and is currently on physical RAM or swap. For example, a whole library may be loaded in swap space with only a few functions being actually in RES. In this case VIRT would show the full size of the library; RES would only show the size of the functions loaded. Another complexity to consider; what happens when a process forks? Well, "top" will show a new process and take the RES of the parent process, showing two processes with 1GB used when really only 1GB is on physical RAM. Additionally, two separate processes using the same functions in shared memory will have the same functions counted to both of their RES values. So then how would you define “used memory”? I choose to define it as “the amount of memory used on physical RAM only, disregarding cache and virtual memory, including shared memory on RAM only once, or uniquely”. The complexity for determining this can be exhausting; nowhere near as exhausting as determining “free” memory, though.

Free Memory

For this, the definition requires that you inform one to the inclusion or exclusion of virtual memory accessible and cache. In Linux, again, lower level tools usually display two values when displaying free memory: the amount free on physical RAM, and the amount free on physical RAM without cache. Virtual memory is usually ignored, because that’s more of the operating system’s prerogative to maintain; from the application’s perspective, it’s has no awareness of what is in swap or not. All the application cares about is how much memory it can request right now, which would be free physical RAM plus cache in most cases.

So, what’s all this mean?

As you can see, the original question "how much memory is your computer using right now?" seems rather naive to ask. However it’s a common metric across the world for millions of users. It’s used to tune performance in professional settings and at home, whether attempting to make code less resource intensive or to stop a game of Overwatch from lagging. So, I believe it’s important to understand that the term “used memory” is totally arbitrary. It changes even between tools within the same OS, let alone across different systems and architectures. Used memory is a calculated value that’s potentially the addition, subtraction, multiplication, and/or division of various other lower level metrics that the OS keeps track of. This makes it incredibly variant and near impossible to talk about in practical terms without many predicates and disclaimers. This is what totally astounds me about it, and what drove me to write this article after learning in depth about how it works. I would say it’s one of the most important and least meaningful stats popularly discussed. It’s also a stat that most would (incorrectly) claim to understand at first, due to how simply it’s displayed in most tools. They make it seem like a simple fraction that you want to keep below 1/2, when really its a complex, multifaceted, totally calculated stat that has little meaning on its own.

Additional readings with greater detail on memory management can be found over here on Linuxaria or on your search engine of choice

Date: Apr 04 2017