Wednesday, April 2, 2014

Linux log messaging architeture


In this post, I will cover high level flow of linux log messages & the configuration controls that can be helpful to inspect the system logs.


Linux Log Messaging - High Level Flow of log messages







Linux log message infrastructure provides separate logging path for user space programs & for kernel space programs.
  • For user space programs, syslog is the logging mechanism. syslog provides a shared library that can be linked with daemons that would like to emit log messages to a special log file. syslog then processes this file & dispatches to remote server or console or log files depending on configuration.
  • For kernel space, printk is logging mechanism. As prink messages can't be directly sent to user space daemon syslog, Linux stores printk messages in a buffer which can be viewed using a few user space tools such as /dev/kmsg or dmesg or klogd or syslog.

Overview of Linux log messaging modules


Let me also provide more details about each module in the above architecture & useful configuration controls of each module:

  • printk
      Usage:

  printk(KERN_DEBUG "%s:%i: Test Debug message\n", __FILE__, __LINE__);

The first argument KERN_DEBUG indicates the log level for this debug message. The allowed log levels & when each level should be used are:

  KERN_EMERG   -  system is unusable. set to 0
  KERN_ALERT   -  action must be taken immediately. set to 1
  KERN_CRIT    -  critical conditions. set to 2
  KERN_ERR     -  error conditions. set to 3
  KERN_WARNING -  warning conditions. set to 4
  KERN_NOTICE  -  normal but significant condition. set to 5
  KERN_INFO    -  informational. set to 6
  KERN_DEBUG   -  debug-level messages. set to 7
  KERN_DEFAULT -  the default kernel loglevel. set to CONFIG_DEFAULT_MESSAGE_LOGLEVEL

If no argument is provided to printk, then the default value would be applied. The default value is determined by compile time macro - CONFIG_DEFAULT_MESSAGE_LOGLEVEL. This is typically would be KERNEL_WARNING.

Even at run time, we can view/set default message log level as well as other printk related configuration controls using profs or sysctl:
      • using sysctl: 
root@babu-VirtualBox:~# sysctl -a | grep printk 
kernel.printk = 4     4     1     7 
kernel.printk_delay = 0 
kernel.printk_ratelimit = 5 
kernel.printk_ratelimit_burst = 10 
root@babu-VirtualBox:~#  

kernel.printk: These values are fetched from the array at http://lxr.linux.no/linux+v3.14.2/kernel/printk/printk.c#L64. Description of these fields ( some cut-pasted from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt): 
§  console_loglevel: messages with a higher priority (meaning a lower numerical value) than this will be printed to the console. The default value is determined by compile time macro - DEFAULT_CONSOLE_LOGLEVEL.
§  default_message_loglevel: messages without an explicit priority will be printed with this priority. The default value is determined by compile time macro - CONFIG_DEFAULT_MESSAGE_LOGLEVEL.
§  minimum_console_loglevel: minimum (highest) value to which   console_loglevel can be set. The default value is determined by compile time macro - MINIMUM_CONSOLE_LOGLEVEL.
§  default_console_loglevel: default value for console_loglevel. The default value is determined by compile time macro - DEFAULT_CONSOLE_LOGLEVEL.

kernel.printk_delay:  Delay each printk message in printk_delay milliseconds. Value from 0 - 10000 is allowed.

kernel.printk_ratelimit:  Some warning messages are rate limited. printk_ratelimit specifies the minimum length of time between these messages (in jiffies), by default we allow one every 5 seconds.  A value of 0 will disable rate limiting.

kernel.printk_ratelimit_burst:  While long term we enforce one message per printk_ratelimit seconds, we do allow a burst of messages to pass through. printk_ratelimit_burst specifies the number of messages we can send before ratelimiting kicks in.


      • using procfs:  
root@babu-VirtualBox:~# cat /proc/sys/kernel/printk
4    4    1    7 
root@babu-VirtualBox:~#

The description for these values is same as description for kernel.printk described above in sysctl command. 

The second argument specifies the message to be logged. The maximum message size of printk is defined by .. In linux3.14.2, it is defaulted to 1K in ..


  • Ring bufferAs prink messages can't be directly sent to user space daemon syslog, Linux stores printk messages in a ring buffer which can be viewed using a few user space tools such as /dev/kmsg or dmesg or klogd or syslog. 

The ring buffer is a character array (as defined at http://lxr.linux.no/linux+v3.14.2/kernel/printk/printk.c#L250). In this array, records of printk log messages are stored sequentially in the order printk is invoked by kernel or kernel modules. The format of each printk record is defined at http://lxr.linux.no/linux+v3.14.2/kernel/printk/printk.c#L198:

timestamp in nanoseconds, length of entire record, length of text buffer, length of dictionary buffer, syslog facility, internal record flags, syslog level.

The size of this buffer is user-defined and can be configured in two methods:

log_buf_len=n[KMG]        Sets the size of the printk ring buffer, in bytes. N must be a power of two. The default size is set in the kernel config file.


printk.c exposes two separate interfaces - /dev/kmsg & "do_syslog + /proc/kmsg" - to read from/write to this ring buffer from user space. I will cover more details about these interfaces in next sections. Different interfaces over same ring buffer are achieved by maintaining separate buffer indices as shown below:


  • /dev/kmsg interface: This is one of the interfaces exposed by printk to read/write to ring buffer.  

This is a special file exposed by a memory driver at linux/drivers/char/mem.c.


root@babu-VirtualBox:~# ls -l /dev/kmsg 
crw-r--r-- 1 root root 1, 11 Apr 29 20:12 /dev/kmsg
root@babu-VirtualBox:~# 

Basically, this is a character device driver which can be read from/written to. Any user space program can open this char driver, read/write to this.

The callback handlers for this char driver are implemented in printk.c. If you read though the open /read callback handlers - devkmsg_open & devkmsg_read functions, this driver allows readers of /dev/kmsg to read ring buffer messages from log_first_idx index to log_next_idx. So, essentially, this fetches entire ring buffer messages. Also, note that messages wouldn't be deleted from ring buffer, they are just read though.


Today, this interface (/dev/kmsg) is used by dmesg utility. We can also directly cat this file. Here is a sample output of last two lines from /dev/kmsg:



6,577,88143690836,-;hid-generic 0003:80EE:0021.0004: input,hidraw0: USB HID v1.10 Mouse [VirtualBox USB Tablet] on usb-0000:00:06.0-1/input0
 SUBSYSTEM=hid
 DEVICE=+hid:0003:80EE:0021.0004
6,578,88149428705,-;e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX


  • do_syslog system call + /proc/kmsg interfaceThis is one of the interfaces exposed by printk to read/write to ring buffer. 
    This system call internally involves do_syslog API exposed by printk module.

    Here are the services provide by this API:
    • Read n bytes of unread ring buffer: Refer code path - do_syslog API--> syslog_print. It allows to read ring buffer messages from syslog_idx to log_next_idx. So, essentially, this fetches so far unread ring buffer messages. The messages once read through this interfaces can't be read again on this interface. However, through the ring buffer's /dev/kmsg interface, it is still possible to read all current logged messages in ring buffer. So, its not that messages are deleted from ring buffer, the read/unread position is just maintained using a separate index syslog_idx as shown in the diagram above.
    • Read n bytes of ring buffer since last clear: Refer code path - do_syslog API--> syslog_print_all.
    • Read n bytes of ring buffer since last clear & then clear the ring buffer: Refer code path - do_syslog API--> syslog_print_all. Clearing doesn't remove messages from ring buffer. The last position of clearing is just remembered using clear_idx as shown in the diagram above. So, even after clearing though this system call, messages can still be read from 
    • Clear the ring buffer: Refer code path - do_syslog API--> syslog_print_all. Again, clearing doesn't really remove messages from ring buffer.
    • Return total size of ring buffer
    • Get total number of unread characters in ring buffer
    • Disable logging to console: This actually doesn't disable. Instead, it sets the console loglevel to the minimum_console_loglevel (which can be read from third integer in /proc/sys/kernel/printk's output.
    • Enable logging to console: This actually doesn't enable. Instead, it sets the console loglevel to the console_loglevel value before the last disabling of console logging. After enabling, we can verify what value this is set to by reading third integer in /proc/sys/kernel/printk's output.
    • Set console_loglevel: messages with a higher priority than this will be printed to the console.

    Its not sufficient to expose these API alone. We need a mechanism to inform interested user space utilities whenever new messages are available in ring buffer. So, a read-only proc file /proc/kmsg is exposed by linux kernel so that interested user space utilities can poll this file:

    root@babu-VirtualBox:~# ls -l /proc/kmsg 
    -r-------- 1 root root 0 Apr 29 20:12 /proc/kmsg
    root@babu-VirtualBox:~# 

    As you see above, this is a read-only proc file. The open/poll/read callback handlers for this proc file are implemented in linux/fs/proc/kmsg.c.

    Today, this interface (do_syslog + /proc/kmsg) is used by 
    • klogd (or syslog variants if klogd is packaged as part of syslog package installation).
    • dmesg utility

    • dmesg: This tool is a linux user space utility that operates on top of /dev/kmsg interface & "dos_syslog system call + /proc/kmsg" interface. Here is the help text of dmesg command in Ubuntu13.10 distribution:

    root@babu-VirtualBox:~# dmesg -h
    Usage:
     dmesg [options]

    Options:
     -C, --clear                 clear the kernel ring buffer
     -c, --read-clear            read and clear all messages
     -D, --console-off           disable printing messages to console
     -d, --show-delta            show time delta between printed messages
     -E, --console-on            enable printing messages to console
     -f, --facility <list>       restrict output to defined facilities
     -h, --help                  display this help and exit
     -k, --kernel                display kernel messages
     -l, --level <list>          restrict output to defined levels
     -n, --console-level <level> set level of messages printed to console
     -r, --raw                   print the raw message buffer
     -s, --buffer-size <size>    buffer size to query the kernel ring buffer
     -T, --ctime                 show human readable timestamp (could be 
                                 inaccurate if you have used SUSPEND/RESUME)
     -t, --notime                don't print messages timestamp
     -u, --userspace             display userspace messages
     -V, --version               output version information and exit
     -x, --decode                decode facility and level to readable string

    Supported log facilities:
        kern - kernel messages
        user - random user-level messages
        mail - mail system
        daemon - system daemons
        auth - security/authorization messages
        syslog - messages generated internally by syslogd
        lpr - line printer subsystem
        news - network news subsystem

    Supported log levels (priorities):
        emerg - system is unusable
        alert - action must be taken immediately
        crit - critical conditions
        err - error conditions
        warn - warning conditions
        notice - normal but significant condition
        info - informational
        debug - debug-level messages
    root@babu-VirtualBox:~# 

    dmesg prints the ring buffer messages in a more human readable format than print raw printk records. Here s a sample output:


    root@babu-VirtualBox:~# dmesg | tail -2
    [88143.690836] hid-generic 0003:80EE:0021.0004: input,hidraw0: USB HID v1.10 Mouse [VirtualBox USB Tablet] on usb-0000:00:06.0-1/input0
    [88149.428705] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
    root@babu-VirtualBox:~#

    The source code of this utility is available at https://github.com/babuneelam/util-linux/blob/master/sys-utils/dmesg.c. As per this source code, read command provides an option to read either from /dev/kmsg interface or through do_syslog interface. But in latest versions of Ubuntu, I didn't see those configuration flags in "dmesg -h" output. So, may be, they are no more supported.

    • klogd: I am yet to explore klogd completely. As of now, from what I understand, klogd comes as user space package in linux distributions. It seems that when syslog-ng or rsyslog are used, klogd is installed as part of those packages. And may be syslog config files then override the klogd config files. The popular /var/log/messages is created by klogd. One interesting addition in this file is that even timestamp of the log message is output as well. 


    root@babu-VirtualBox:~# cat /var/log/messages | tail -2
    Apr 29 19:58:41 babu-VirtualBox mtp-probe: bus: 1, device: 3 was not an MTP device 
    Apr 29 20:06:24 babu-VirtualBox exiting on signal 15
    root@babu-VirtualBox:~# 

    I am yet to completely understand how this module evolved, usage & configuration controls

    • Syslog (and syslog variations rsyslog and syslog-ng): I am yet to explore syslog infrastructure completely. As of now, from what I understand, syslog has a few evolutionary variations - syslog, rsyslog, syslog-ng. They come as user space packages in linux distributions. syslog infrastructure provides a user space library that can be linked with user space daemons that would to emit log messages. The library emits these messages to /dev/log. Syslog infrastructure provides another user space daemon - syslog - that reads the messages from /dev/log & then consults it's config file /etc/syslog.config to dispatch the messages to remote server or console or log files. It seems that when syslog-ng or rsyslog are used, klogd is installed as part of those packages. I am yet to completely understand how these modules evolved, usage & configuration controls
      • How is special file /dev/log implemented? Are the messages stored in kernel? If so, does syslog poll this file to read messages ?

    TBD:
    • printk debugging during boot up: early_printk, CONFIG_EARLY_PRINTK
    • printk debugging when system hangs.. delays etc
    • do we lose unread ring buffer messages in case of panic?
    • quiet_mode boot arg

    References:

    http://lxr.linux.no/
    https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
    https://www.kernel.org/doc/Documentation/kernel-parameters.txt

    No comments:

    UA-48797665-1