Wednesday, February 5, 2014

Memory Hog Debugging: Check system level memory accounting in linux


In this post, I will provide the commands that can help understand whether a memory hog is due to lack of low memory or high memory or due to fragmentation.

To get just a brief of available/free space in units of MB, I use "free -m" command:

babu@babu-VirtualBox:~$ free -m
            

total 
used
free
shared
buffers
cached
Mem:    
1002
664
337
0
71
245
-/+ buffers/cache:  

346
655



Swap:
510
0
510



           babu@babu-VirtualBox:~$ 


Brief explanation of the fields above:
  • Mem Line:
    • Total: Total RAM available in the system minus small amount of memory kernel uses for it's startup. Total = used+free
    • used: Total RAM currently in use.
    • free: total RAM not in use. 
      • Interpretation: But this is not the field to look at to understand whether system is in memory hog situation.
    • shared/buffers/cached: total RAM used for shared/buffers/cached purposes. These are already counted in "used" field above.
  • buffers/cache line: This is the line to focus while troubleshooting memory hogs.
    • used: "used" field in Mem Line - "buffers+cached"
      • Interpretation: This field provides info about the total amount of RAM that is in use without accounting for buffers/cache.
    • free: "free" field in Mem line + "buffers+cached"
      • Interpretation: This field indicates the total amount of RAM if not for buffers/cache. This is the field to look at while troubleshooting memory hogs because buffers/caches will be freed by the OS when memory is scarce. So, the current allocation for buffers/cache is irrelevant in determining whether a memory hog situation exists.
  • Swap Line: shows swap total/used/free values.

To get a more comprehensive details of system memory, I use /proc/meminfo to get an overview of system level memory consumption.

Here is the output of /proc/meminfo & the key fields I interpret to check for out of memory situation and hence the possibility of a growing memory hog:

babu@babu-VirtualBox:~$ cat /proc/meminfo
MemTotal:     
  1026276 kB
MemFree:    
     415128 kB
Buffers:      
    69532 kB
Cached:      
    215464 kB
SwapCached:    
       0 kB
Active:       
   357420 kB
Inactive:     
   210468 kB
Active(anon):   
 283704 kB
Inactive(anon):  
  2340 kB
Active(file):   
  73716 kB
Inactive(file):
  208128 kB
Unevictable: 
        32 kB
Mlocked:      
       32 kB
HighTotal:   
    135112 kB
HighFree:    
      1776 kB
LowTotal:    
    891164 kB
LowFree:      
   413352 kB
SwapTotal:   
    523260 kB
SwapFree:      
  523260 kB
Dirty:       
      3228 kB
Writeback:      
      0 kB
AnonPages:     
  282852 kB
Mapped:         
  82472 kB
Shmem:          
    3156 kB
Slab:           
  24076 kB
SReclaimable:  
   12172 kB
SUnreclaim:     
  11904 kB
KernelStack:    
   2896 kB
PageTables:       
 6716 kB
NFS_Unstable:    
     0 kB
Bounce:           
    0 kB
WritebackTmp:   
      0 kB
CommitLimit:  
  1036396 kB
Committed_AS:
   2491396 kB
VmallocTotal:  
  122880 kB
VmallocUsed:    
  20808 kB
VmallocChunk:   
 101260 kB
HardwareCorrupted:  
  0 kB
AnonHugePages:     
   0 kB
HugePages_Total:  
    0
HugePages_Free:       
0
HugePages_Rsvd:       
0
HugePages_Surp:       
0
Hugepagesize:     
 2048 kB
DirectMap4k:     
 18424 kB
DirectMap2M:    
 894976 kB
babu@babu-VirtualBox:~$


Brief explanation of the above key fields:
  • How much memory does the system totally have & what is the split?
    • MemTotal - Total usable RAM in the system minus kernel binary code
    • LowTotal
    • HighTotal
    • SwapTotal - Total usable Swap space available in the system. 
      • Interpretation: Embedded may simply be not using swap for performance concerns. So, this field would be irrelevant in such systems.
  • How much free/used memory does the system have now & what is the split?
    • MemFree - The total amount of free/unused memory in the system. This is sum of LowFree & HighFree. 
      • Interpretation: MemFree is NOT a real indication of unused memory in the system. Cached, SwapCached & Buffers should be added to MemFee to get total available memory because Cached/SwapCached/Buffers would be freed when memory becomes scarce in the system.
    • Cached - The amount of RAM used for caching file based I/O (in pages). Unused RAM in the system for caching file data whenever files are read & written to. By doing so, when a file is read the second time, it is usually fetched directly from RAM rather than from dis & hence making it accessible to end use faster. Similarly, when the user writes to a file, the changes are written into this cache first & then a background flush thread would sync the changes to disk offline & hence making the user experience faster. The cache pages modified & not yet written to the disk are set with dirty bit.
      • Interpretation: Cached/SwapCached/Buffers should be added to MemFee to get total available memory because Cached/SwapCached/Buffers would be freed when memory becomes scarce in the system. So, add Cached/SwapCached/Buffers to MemFree to get real memory available in the system. However, this memory is not available for GFP_ATOMIC allocations made by kernel.
    • Buffers - The amount of RAM used for disk block based I/O (in blocks). 
      • Interpretation: Because most files are represented by filesystem based organization than block organization, this usage is small & hence is irrelevant for memory hog troubleshooting. Cached/SwapCached/Buffers should be added to MemFee to get total available memory because Cached/SwapCached/Buffers would be freed when memory becomes scarce in the system. So, add Cached/SwapCached/Buffers to MemFree to get real memory available in the system. However, this memory is not available for GFP_ATOMIC allocations made by kernel.
    • SwapCached - The amount of RAM used for swap cache. Swap Cache is used when pages are brought by the OS from disk to RAM. If the OS were to swap back these paes back to disk, it will smply throw these pages from RAM rather write them back disk again as there is a copy on the disk already. So, SwapCached helps improve swapping performance. Page cache is the sum of Cached & SwapCached.
      • Interpretation: Add Cached/SwapCached/Buffers to MemFree to get real memory available in the system. However, this memory is not available for GFP_ATOMIC allocations made by kernel.
    • LowFree - Kernel resides in Low memory. This is the memory kernel can access directly & keeps all it's data structures. LowFree indicates the total amount of free/unused memory in Low Memory & hence memory available for kernel's data structures. 
      • Interpretation: A memory hog can occur even when MemFree shows significant memory but when kernel is running low in LowFree.  So, checking for this value is important when troubleshooting memory hogs.
    • HighFree - The amount of free/unused memory in High Memory. High Memory is mostly used by user space applications. 
    • SwapFree - The total free/unused swap memory available. 
      • Interpretation: Embedded may be not using swap for performance concerns. So, this field would be irrelevant in such systems.

Page allocator Debug Info:

Even when there is memory available in the system, An OOM could result due to severe external fragmentation. /proc/buddyinfo is a useful command for diagnosing these kind of OOMs. Buddyinfo can provide clues as to how big an area the system can safely allocate now & why a previous allocation failed.


root@babu-VirtualBox:~# cat /proc/buddyinfo
           
Node 0, zone 
DMA
1
0
0
1
2
1
1
0
1
1
3
Node 0, zone 
Normal
424
102
63
22
19
13
5
2     
3
2
73
Node 0, zone 
HighMem
47
16
9
4
3
6
1
0
0
0
0
root@babu-VirtualBox:~#


Each column represents the number of pages of a certain order which are available. In this case, there are 424 chunks of 2^0*PAGE_SIZE available in ZONE_NORMAL, 102 chunks of 2^1*PAGE_SIZE in ZONE_NORMAL, 63 chunks of 2^2*PAGE_SIZE in ZONE_NORMAL, etc..

When the higher order blocks become 0, then fragmentation is becoming a problem.


More information relevant to external fragmentation can be found in  pagetypeinfo. I haven't yet debugged OOMs caused by fragmentation. So, I will skip getting into pagetypeinfo for now as it is a more detailed tool than buddyinfo.

No comments:

UA-48797665-1