Tuesday, March 4, 2014

slub_debug enhancements: My wish list


In my earlier post, I explained the capabilities of slub_debug. After experimentation, I think slub_debug can be improvised in the following aspects:

  • In case of a buffer overrun that spans beyond alloc/free metadata, sub_debug's alloc/free trace would be corrupted too. As a result, the source of memory corruption can't be detected. So, to enable better triage of the source of memory corruption, it sounded like it might be a good idea to store alloc/free stack trace at the beginning of the memory block than at the end. However, buffer under runs (with negative indices of the array) are another source of memory corruption. An under run corrupts the alloc/free info if it were stored at the beginning. So, I think we should have an option in slub_debug which allows us to store alloc/free stack trace at the beginning or at the end or at both places.
  • slub_debug does it's job in alloc/free context. As a result, some errors such as "use after free" and buffer overruns extending more than one block of memory are not until the next alloc/free is done on those memory blocks. I think we should have a more proactive way of detecting these errors. 
    • When redzone errors are detected, slub_debug to check adjacent blocks also for errors until no more redzone errors are found in the adjacent blocks. This way, slub_debug can catch all corrupted blocks of a buffer overrun completely at once.
    • A facility to run slabinfo tool in background for every n seconds (configurable).
  • Sometimes, a memory owner would wildly access the nearby adjacent memory blocks & corrupt them. The current report/log from slub_debug doesn't provide information about adjacent memory owners. So, slub_debug should provide a configuration that allows to enable such feature as well the number of adjacent owners to be included in slub_debug log/output.
  • When a memory block is overrun, the adjacent memory block owners would help. However, sometimes, its the previous owners of those adjacent blocks who might have corrupted the memory. So, it would be a good idea to have an option in slub_debug to store history of a few previous owners & the allocation/free time stamps.
  • From my experiments, slub_debug didn't detect buffer overruns that overwrote less than 30+ bytes more than the allocated memory. I am yet to investigate why such overruns are not detected.
  • Today, slub_debug just logs the kernel memory errors & lets the system continue running. And the system runs until this memory corruption becomes contagious enough to induce a panic!! However, to analyze the source of memory corruption, it is better that system is either brought to kdb/kgdb prompt or a core dump is generated for offline analysis at the first instance of detecting a memory corruption (such as redzone/padding corruption). If not default, we should have this capability as a configurable option.

No comments:

UA-48797665-1