In this post, I will cover high level flow of linux log messages & the configuration controls that can be helpful to inspect the system logs.
Linux Log Messaging - High Level Flow of log messages
- For user space programs, syslog is the logging mechanism. syslog provides a shared library that can be linked with daemons that would like to emit log messages to a special log file. syslog then processes this file & dispatches to remote server or console or log files depending on configuration.
- For kernel space, printk is logging mechanism. As prink messages can't be directly sent to user space daemon syslog, Linux stores printk messages in a buffer which can be viewed using a few user space tools such as /dev/kmsg or dmesg or klogd or syslog.
Overview of Linux log messaging modules
Let me also provide more details about each module in the above architecture & useful configuration controls of each module:
- printk:
printk(KERN_DEBUG "%s:%i: Test Debug
message\n", __FILE__, __LINE__);
The first argument
KERN_DEBUG indicates the log level for this debug message. The allowed log
levels & when each level should be used are:
KERN_EMERG - system is
unusable. set to 0
KERN_ALERT - action must be
taken immediately. set to 1
KERN_CRIT - critical
conditions. set to 2
KERN_ERR -
error conditions. set to 3
KERN_WARNING - warning conditions. set
to 4
KERN_NOTICE - normal but
significant condition. set to 5
KERN_INFO -
informational. set to 6
KERN_DEBUG - debug-level
messages. set to 7
KERN_DEFAULT - the default kernel
loglevel. set to CONFIG_DEFAULT_MESSAGE_LOGLEVEL
If no
argument is provided to printk, then the default value would be applied. The
default value is determined by compile time macro - CONFIG_DEFAULT_MESSAGE_LOGLEVEL. This is typically would
be KERNEL_WARNING.
Even at
run time, we can view/set default message log level as well as other
printk related configuration controls using profs or sysctl:
- using sysctl:
root@babu-VirtualBox:~#
sysctl -a | grep printk
kernel.printk = 4 4 1 7
kernel.printk = 4 4 1 7
kernel.printk_delay
= 0
kernel.printk_ratelimit
= 5
kernel.printk_ratelimit_burst
= 10
root@babu-VirtualBox:~#
kernel.printk: These values are fetched from the array at http://lxr.linux.no/linux+v3.14.2/kernel/printk/printk.c#L64. Description of these fields ( some cut-pasted from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt):
§ console_loglevel: messages with
a higher priority (meaning a lower numerical value) than this will be printed to the
console. The default value is determined by compile time macro - DEFAULT_CONSOLE_LOGLEVEL.
§ default_message_loglevel:
messages without an explicit priority will be printed with this priority. The
default value is determined by compile time macro - CONFIG_DEFAULT_MESSAGE_LOGLEVEL.
§ minimum_console_loglevel:
minimum (highest) value to which console_loglevel can be set. The
default value is determined by compile time macro - MINIMUM_CONSOLE_LOGLEVEL.
§ default_console_loglevel:
default value for console_loglevel. The default value is determined by compile
time macro - DEFAULT_CONSOLE_LOGLEVEL.
kernel.printk_delay:
Delay each printk message in printk_delay milliseconds. Value from 0 - 10000 is
allowed.
kernel.printk_ratelimit:
Some warning messages are rate limited. printk_ratelimit specifies the minimum
length of time between these messages (in jiffies), by default we allow one
every 5 seconds. A value of 0 will disable rate limiting.
kernel.printk_ratelimit_burst:
While long term we enforce one message per printk_ratelimit seconds, we do
allow a burst of messages to pass through. printk_ratelimit_burst specifies the
number of messages we can send before ratelimiting kicks in.
- using procfs:
root@babu-VirtualBox:~# cat /proc/sys/kernel/printk
4 4 1
7
root@babu-VirtualBox:~#
The description for these values is same as
description for kernel.printk described above in sysctl command.
The second
argument specifies the message to be logged. The maximum message size of
printk is defined by .. In linux3.14.2, it is defaulted to 1K in ..
- Ring buffer: As prink messages can't be directly sent to user space daemon syslog, Linux stores printk messages in a ring buffer which can be viewed using a few user space tools such as /dev/kmsg or dmesg or klogd or syslog.
The ring buffer is a character array (as defined at http://lxr.linux.no/linux+v3.14.2/kernel/printk/printk.c#L250). In this array, records of printk log messages are stored sequentially in the order printk is invoked by kernel or kernel modules. The format of each printk record is defined at http://lxr.linux.no/linux+v3.14.2/kernel/printk/printk.c#L198:
timestamp in nanoseconds, length of entire record, length of text buffer, length of dictionary buffer, syslog facility, internal record flags, syslog level.
The size of this buffer is user-defined and can be configured in two methods:
- the compile macro - CONFIG_LOG_BUF_SHIFT (2 power this value is ring buffer size). In linux 3.14.2, it is default ring buffer size configured to be 128K in kernel config file. In earlier versions, this was 16KB as well.
- the boot argument - log_buf_len. cut-paste from https://www.kernel.org/doc/Documentation/kernel-parameters.txt:
log_buf_len=n[KMG] Sets the size of the printk
ring buffer, in bytes. N must be a power of two. The default size is set in the
kernel config file.
printk.c exposes two separate interfaces - /dev/kmsg & "do_syslog + /proc/kmsg" - to read from/write to this ring buffer from user space. I will cover more details about these interfaces in next sections. Different interfaces over same ring buffer are achieved by maintaining separate buffer indices as shown below:
- /dev/kmsg interface: This is one of the interfaces exposed by printk to read/write to ring buffer.
This is a special file exposed by a memory driver at linux/drivers/char/mem.c.
root@babu-VirtualBox:~# ls -l /dev/kmsg
crw-r--r-- 1 root root 1, 11 Apr 29 20:12
/dev/kmsg
root@babu-VirtualBox:~#
The callback handlers for this char driver are implemented in printk.c. If you read though the open /read callback handlers - devkmsg_open & devkmsg_read functions, this driver allows readers of /dev/kmsg to read ring buffer messages from log_first_idx index to log_next_idx. So, essentially, this fetches entire ring buffer messages. Also, note that messages wouldn't be deleted from ring buffer, they are just read though.
Today, this interface (/dev/kmsg) is used by dmesg utility. We can also directly cat this file. Here is a sample output of last two lines from /dev/kmsg:
6,577,88143690836,-;hid-generic
0003:80EE:0021.0004: input,hidraw0: USB HID v1.10 Mouse [VirtualBox USB Tablet]
on usb-0000:00:06.0-1/input0
SUBSYSTEM=hid
DEVICE=+hid:0003:80EE:0021.0004
6,578,88149428705,-;e1000:
eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
- do_syslog system call + /proc/kmsg interface: This is one of the interfaces exposed by printk to read/write to ring buffer.
- Read n bytes of unread ring buffer: Refer code path - do_syslog API--> syslog_print. It allows to read ring buffer messages from syslog_idx to log_next_idx. So, essentially, this fetches so far unread ring buffer messages. The messages once read through this interfaces can't be read again on this interface. However, through the ring buffer's /dev/kmsg interface, it is still possible to read all current logged messages in ring buffer. So, its not that messages are deleted from ring buffer, the read/unread position is just maintained using a separate index syslog_idx as shown in the diagram above.
- Read n bytes of ring buffer since last clear: Refer code path - do_syslog API--> syslog_print_all.
- Read n bytes of ring buffer since last clear & then clear the ring buffer: Refer code path - do_syslog API--> syslog_print_all. Clearing doesn't remove messages from ring buffer. The last position of clearing is just remembered using clear_idx as shown in the diagram above. So, even after clearing though this system call, messages can still be read from
- Clear the ring buffer: Refer code path - do_syslog API--> syslog_print_all. Again, clearing doesn't really remove messages from ring buffer.
- Return total size of ring buffer
- Get total number of unread characters in ring buffer
- Disable logging to console: This actually doesn't disable. Instead, it sets the console loglevel to the minimum_console_loglevel (which can be read from third integer in /proc/sys/kernel/printk's output.
- Enable logging to console: This actually doesn't enable. Instead, it sets the console loglevel to the console_loglevel value before the last disabling of console logging. After enabling, we can verify what value this is set to by reading third integer in /proc/sys/kernel/printk's output.
- Set console_loglevel: messages with a higher priority than this will be printed to the console.
- Refer kmsg_poll. kmsg_poll waits on this wait_queue exposed by printk.c. printk.c wakes up this wait_queue as and when messages are written to ring buffer.
- The read handler simply reads the messages: kmsg_read --> do_syslog API
- klogd (or syslog variants if klogd is packaged as part of syslog package installation).
- dmesg utility
This system call internally involves do_syslog API exposed by printk module.
Here are the services provide by this API:
Its not sufficient to expose these API alone. We need a mechanism to inform interested user space utilities whenever new messages are available in ring buffer. So, a read-only proc file /proc/kmsg is exposed by linux kernel so that interested user space utilities can poll this file:
root@babu-VirtualBox:~# ls -l /proc/kmsg
-r-------- 1 root root 0 Apr 29 20:12 /proc/kmsg
root@babu-VirtualBox:~#
As you see above, this is a read-only proc file. The open/poll/read callback handlers for this proc file are implemented in linux/fs/proc/kmsg.c.
Today, this interface (do_syslog + /proc/kmsg) is used by
- dmesg: This tool is a linux user space utility that operates on top of /dev/kmsg interface & "dos_syslog system call + /proc/kmsg" interface. Here is the help text of dmesg command in Ubuntu13.10 distribution:
root@babu-VirtualBox:~# dmesg -h
Usage:
dmesg [options]
Options:
-C, --clear
clear the kernel ring buffer
-c, --read-clear
read and clear all messages
-D, --console-off
disable printing messages to console
-d, --show-delta
show time delta between printed messages
-E, --console-on
enable printing messages to console
-f, --facility <list>
restrict output to defined facilities
-h, --help
display this help and exit
-k, --kernel
display kernel messages
-l, --level <list>
restrict output to defined levels
-n, --console-level <level> set
level of messages printed to console
-r, --raw
print the raw message buffer
-s, --buffer-size <size>
buffer size to query the kernel ring buffer
-T, --ctime
show human readable timestamp (could be
inaccurate if you have
used SUSPEND/RESUME)
-t, --notime
don't print messages timestamp
-u, --userspace
display userspace messages
-V, --version
output version information and exit
-x, --decode
decode facility and level to readable string
Supported log facilities:
kern - kernel messages
user - random user-level messages
mail - mail system
daemon - system daemons
auth - security/authorization
messages
syslog - messages generated internally by
syslogd
lpr - line printer subsystem
news - network news subsystem
Supported log levels (priorities):
emerg - system is unusable
alert - action must be taken
immediately
crit - critical conditions
err - error conditions
warn - warning conditions
notice - normal but significant condition
info - informational
debug - debug-level messages
root@babu-VirtualBox:~#
dmesg prints the ring buffer messages in a more human readable format than print raw printk records. Here s a sample output:
root@babu-VirtualBox:~# dmesg | tail -2
[88143.690836] hid-generic 0003:80EE:0021.0004:
input,hidraw0: USB HID v1.10 Mouse [VirtualBox USB Tablet] on
usb-0000:00:06.0-1/input0
[88149.428705] e1000: eth0 NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX
root@babu-VirtualBox:~#
The source code of this utility is available at https://github.com/babuneelam/util-linux/blob/master/sys-utils/dmesg.c. As per this source code, read command provides an option to read either from /dev/kmsg interface or through do_syslog interface. But in latest versions of Ubuntu, I didn't see those configuration flags in "dmesg -h" output. So, may be, they are no more supported.
- klogd: I am yet to explore klogd completely. As of now, from what I understand, klogd comes as user space package in linux distributions. It seems that when syslog-ng or rsyslog are used, klogd is installed as part of those packages. And may be syslog config files then override the klogd config files. The popular /var/log/messages is created by klogd. One interesting addition in this file is that even timestamp of the log message is output as well.
root@babu-VirtualBox:~# cat /var/log/messages
| tail -2
Apr 29 19:58:41 babu-VirtualBox mtp-probe: bus:
1, device: 3 was not an MTP device
Apr 29 20:06:24 babu-VirtualBox exiting on
signal 15
root@babu-VirtualBox:~#
I am yet to completely understand how this module evolved, usage &
configuration controls.
- Syslog (and syslog variations rsyslog and syslog-ng): I am yet to explore syslog infrastructure completely. As of now, from what I understand, syslog has a few evolutionary variations - syslog, rsyslog, syslog-ng. They come as user space packages in linux distributions. syslog infrastructure provides a user space library that can be linked with user space daemons that would to emit log messages. The library emits these messages to /dev/log. Syslog infrastructure provides another user space daemon - syslog - that reads the messages from /dev/log & then consults it's config file /etc/syslog.config to dispatch the messages to remote server or console or log files. It seems that when syslog-ng or rsyslog are used, klogd is installed as part of those packages. I am yet to completely understand how these modules evolved, usage & configuration controls.
- How is special file /dev/log implemented? Are the messages stored in kernel? If so, does syslog poll this file to read messages ?
TBD:
- printk debugging during boot up: early_printk, CONFIG_EARLY_PRINTK
- printk debugging when system hangs.. delays etc
- do we lose unread ring buffer messages in case of panic?
- quiet_mode boot arg
References:
http://lxr.linux.no/
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
https://www.kernel.org/doc/Documentation/kernel-parameters.txt
No comments:
Post a Comment