Tuesday, July 28, 2015

Store and Print Stack Trace in Linux Kernel



Linux kernel has facilities to dump the stack (show_stack) from any function in kernel or kernel module. However, I haven't come across a facility that allows to store the stack trace for later retrieval and print the same. In the past, I used linux code and wrote my own function to achieve these. I will provide sample code for these facilities in this post.

Here is a sample code:

#include "asm_powerpc_stk.h" #define MAX_STK_DEPTH 30
#define STK_TRACE_SKIP_CNT 2
#ifdef CONFIG_IRQSTACKS
static __always_inline int my_valid_irq_stack(unsigned long sp, struct task_struct *p,
unsigned long nbytes)
{
unsigned long stack_page;
unsigned long cpu = task_cpu(p);
/*
* Avoid crashing if the stack has overflowed and corrupted
* task_cpu(p), which is in the thread_info struct.
*/
if (cpu < NR_CPUS && cpu_possible(cpu)) {
stack_page = (unsigned long) hardirq_ctx[cpu];
if (sp >= stack_page + sizeof(struct thread_struct)
&& sp <= stack_page + THREAD_SIZE - nbytes)
return 1;
stack_page = (unsigned long) softirq_ctx[cpu];
if (sp >= stack_page + sizeof(struct thread_struct)
&& sp <= stack_page + THREAD_SIZE - nbytes)
return 1;
}
return 0;
}
#else
#define my_valid_irq_stack(sp, p, nb) 0
#endif /* CONFIG_IRQSTACKS */
static __always_inline int my_validate_sp(unsigned long sp, struct task_struct *p,
unsigned long nbytes)
{
unsigned long stack_page = (unsigned long)task_stack_page(p);
if (sp >= (unsigned long)end_of_stack(p)
&& sp <= stack_page + THREAD_SIZE - nbytes)
return 1;
return my_valid_irq_stack(sp, p, nbytes);
}
static __always_inline unsigned int my_get_stack_trace(unsigned long *stk_trace)
{
unsigned long sp;
unsigned int i;
sp = my_get_current_stack_pointer(sp); /*Platform-specific func */
for (i=1; i< STK_TRACE_SKIP_CNT; i++)
{
if (!my_validate_sp(sp, current, STACK_FRAME_OVERHEAD))
return 0;
sp = ((unsigned long *)sp)[0];
}
for (i=0; i< MAX_STK_DEPTH; i++)
{
if (!my_validate_sp(sp, current, STACK_FRAME_OVERHEAD))
return i;
stk_trace[i] = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE];
sp = ((unsigned long *)sp)[0];
}
return MAX_STK_DEPTH;
}
static void my_get_and_print_stack_trace()
{
unsigned long stk_trace[MAX_STK_DEPTH];
unsigned int stk_trace_len;
stk_trace_len = my_get_stack_trace(stk_trace);
for (k = 0; (k < stk_trace_len); k++) {
printk(" %pS\n", (void *)stk_trace[k]);
}
}
 

Here is the complete code in my git hub repository: https://github.com/babuneelam/linux_kernel_stack_trace.

Hope this helps.


References:

Monday, June 22, 2015

Boot Args - Basics, Usage and How to add a custom boot arg?



In the past, I have added quite a few custom boot args that improve debugging ability of linux kernel code. In this post, I will go over basics of linux boot args, usage in Ubuntu context and how to add a custom boot arg.


What are Boot Args?

Boot Args are also referred to as boot params or boot options.

Just the way a linux program's options can be supplied in command line, linux kernel can be supplied with options that can define the kernel behavior.

For example, Boot args are useful for the following:
  • enable some debug features (say slub_debug)
  • Override default configuration values (say panic)
  • supply information about the hardware kernel might not be able to determine on it's own
 
Most of the boot arguments have the form:
           name[=value_1][,value_2]...[,value_10]

where 'name' is a unique keyword that is used to identify what part of the kernel the associated values (if any) are to be given to.


Some well-known boot args:
  • root: 
This argument tells the kernel what device is to be used as the root file system while booting. The default of this setting is determined at compile time, and usually is the value of the root device of the system that the kernel was built on. To override this value, and select the second floppy drive as the root device, one would use  
                     'root=/dev/fd1' 
The root device can be specified symbolically or numerically. A symbolic specification has the form /dev/XXYN, where XX designates the device type (e.g., 'hd' for ST-506 compatible hard disk, with Y in 'a'-'d'; 'sd' for SCSI compatible disk, with Y in 'a'-'e'), Y the driver letter or number, and N the number (in decimal) of the partition on this device.
Note that this has nothing to do with the designation of these devices on your filesystem. The '/dev/' part is purely conventional.
The more awkward and less portable numeric specification of the above possible root devices in major/minor format is also accepted. (For example, /dev/sda3 is major 8, minor 3, so you could use 'root=0x803' as an alternative.)
Example: root=/dev/sda1
  • rootfstype:
The 'rootfstype' option tells the kernel to mount the root file system as if it where of the type specified. This can be useful (for example) to mount an ext3 filesystem as ext2 and then remove the journal in the root filesystem, in fact reverting its format from ext3 to ext2 without the need to boot the box from alternate media.
Example: rootfstype=ext4
  • initrdSpecify the location of the initial ramdisk.
Example: initrd=\initramfs-linux.img
  • init:
This sets the initial command to be executed by the kernel.If this is not set, or cannot be found, the kernel will try /sbin/init, then /etc/init, then /bin/init, then /bin/sh and panic if all of this fails.
Example: init=/bin/sh 
  • debug
This boot argument sets console log level to KERNEL_DEBUG & hence causing a lot of debug messages on console. The console loglevel can also be set on a booted system via the /proc/sys/kernel/printk file.
 Example: debug 
  • quiet:
This is pretty much the opposite of the `debug' argument. Set the default kernel log level to KERN_WARNING, which suppresses all console messages during boot except extremely serious ones. Normal messages about hardware detection at boot are suppressed.
Example: quiet  
  • slub_debug: Usage - slub_debug[=options[,slabs]]
Enabling slub_debug allows one to determine the culprit if slab objects become corrupted. Enabling slub_debug can create guard zones around objects and may poison objects when not in use. Also tracks the last alloc / free. For more information see Documentation/vm/slub.txt. 
Example: slub_debug  
  • kmemeleak:
Boot-time kmemleak enable/disable
Valid arguments: on, off
Default: on
Example: kmemleak=off 
  • kgdoc
kgdb over consoles. Requires a tty driver that supports console polling, or a supported polling keyboard driver (non-usb). 
Serial only format: <serial_device>[,baud]
keyboard only format: kbd
keyboard and serial format: kbd,<serial_device>[,baud]
Optional Kernel mode setting:
 kms, kbd format: kms,kbd
 kms, kbd and serial format: kms,kbd,<ser_dev>[,baud]
Example: kgdboc=ttyS0,115200  
  • panic: Usage - panic=N
By default, the kernel will not reboot after a panic, but this option will cause a kernel reboot after N seconds from the panic event (if N is greater than zero). This panic timeout can also be set by
                  echo N > /proc/sys/kernel/panic
 Example: panic=15 
  • ....
  • ....


The descriptions of the above boot args are copy pasted from one of the references listed at the end of this post.

Sample bootarg command lines:
  • bootargs=root=/dev/sda1 rootfstype=ext4 quiet
  • bootargs=root=/dev/sda2 rootfstype=ext4 quiet slub_debug panic=10 init=/bin/sh


How to edit boot options in Ubuntu?

Editing boot args is different with different boot loaders. I have GRUB boot loader in the Ubuntu VM I created. The GRUB how to for editing boot args can be found in my other post here : How to edit boot options during linux booting?

What is maximum allowed size of bootargs and maximum number of bootargs?
The number of kernel parameters is not limited, but the length of the complete command line (parameters including spaces etc.) is limited to a fixed number of characters. This limit depends on the architecture and is between 256 and 4096 characters. It is defined in the file ./include/asm/setup.h as COMMAND_LINE_SIZE. For x86 architecture, it is set to 2K: https://lxr.missinglinkelectronics.com/linux/arch/x86/include/asm/setup.h#L6 




After the system boots, how can we find the boot args it was loaded?

The bootloader passes the boot parameters to the Linux kernel in a memory buffer called the kernel command line. A copy of the kernel command line is in the file /proc/cmdline.
root@babu-VirtualBox:/# cat /proc/cmdline
root=/dev/sda1 rootfstype=ext4 quiet
root@babu-VirtualBox:/# 
How to create a new/custom boot arg?

Anywhere in linux kernel code, add a function setup function as done for slub_debug: https://lxr.missinglinkelectronics.com/linux/mm/slub.c#L1167

Also, define the setup function too as done for slub_debug: https://lxr.missinglinkelectronics.com/linux/mm/slub.c#L1099


Where are boot args initialized in the kernel initialization sequence?
I just went through the kernel init code to understand the stage at which boot args are being initiatlied - https://lxr.missinglinkelectronics.com/linux/init/main.c#L535.Looks like it's pretty early - even before initialization of memory management code. There are only around 10 init functions before boot args are being initalized. 

Can a boot arg setup function be part of a kernel module?

No. 
Boot args are loaded first. Modules are loaded much later. So, if we define these __setup function for a boot param inside a kernel module, kernel would assume that corresponding boot arg set up function is not defined (as the module is not loaded yet, symbol won't be found). And hence skips processing that boot param. So, the correct method is to include all the __setup functions as part of core kernel code itself.



References:



UA-48797665-1