debugging
Last updated
Last updated
printk can be used anywhere. It can be called from interrupt or process context. even when a lock is held. It can only not be used prior to console initialization.
We can set a default loglevel to make it easy to debug
kolgd or the kernel log daemon chooses any of the two potential sources of kernel log information: the /proc file system and the syscall (sys_syslog) interface.
After reading the logs it sends them to syslogd which then saves it to a file. Default /var/log/messages
condition variables can be used while debugging drivers to trigger certain conditions or to print logs under certain conditions
If an issue is identified in a new version, then git bisect can be helpful to perform binary searches on the tree history in order to trace the version that introduced the bug.
If an OOPs occurs in the kernel, the kernel dumps information about the various registers, its contents and the call stack.
The call stack has hexadecimal addresses, to convert those to symbols, we need a symbol table that is provided by System.map which is unique for each kernel version and build configs.
kallsysm is a feature that allows symbols to be generated during build time so that debugging can be done without System.map file. The following config options are handy to generate debug information when building the kernel :
Once the symbol name and the addresses are known, then we can use gdb to debug the object file of the faulty module.
For a sample Oops error :
IP
denotes the instruction pointer at the time of the fault
PGD
, PMD
and PUD
denote the Page Global Directory, Page Upper Directory, and Page Middle Directory, part of the kernel's paging mechanism, used to translate virtual addresses to physical addresses. The values show the state of the paging structure at the time of the crash.
Oops: 002 [#1] here 002 is an error code in hex.
bit 0 == 0
means no page found, 1 means a protection fault
bit 1 == 0
means read, 1 means write
bit 2 == 0
means kernel, 1 means user-mode
[#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a cascading effect of the first one.
last sysfs file
tells the last sysfs file accessed, which can sometimes help in identifying what the system was doing when the error occurred.
Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64
. Here Pid denotes the pid of the faulty process and comm denotes the command that was invoked which created the process.
Tainted shows if the kernel was tainted at the time of the crash, meaning it was in a state that is unsupported by the kernel developers. Possible values are :
P — Proprietary module has been loaded.
F — Module has been forcibly loaded.
S — SMP with a CPU not designed for SMP.
R — User forced a module unload.
M — System experienced a machine check exception.
B — System has hit bad_page.
U — Userspace-defined naughtiness.
A — ACPI table overridden.
W — Taint on warning.
The dump further contains the various registers and their values at the time of the fault. It is followed by the stack trace of the invocations leading to the fault.
Once we have symbols corresponding to the stack trace and the instruction pointer, we can use gdb to load the faulty module or further disassemble it using objdump and proceed with the debugging process.
QEMU supports working with gdb via gdb’s remote-connection facility (the “gdbstub”). Basically this means that QEMU can act as a gdbserver, and gdb can connect to it and debug the kernel running inside the QEMU VM.
To launch the kernel with QEMU you require the following components :
The kernel image (bzImage or zImage)
The filesystem image (rootfs)
Command line arguments to be passed to the kernel
Kernel configs to enable during build time for easy gdb debugging :
CONFIG_DEBUG_INFO
: This option includes debugging information in the kernel image. This is useful for debugging the kernel using gdb.
CONFIG_DEBUG_INFO_SPLIT
: (optional) This significantly reduces the size of the kernel image and kernel modules installed on the device or VM we will be debugging. Note that this option requires a gcc version greater than or equal to version 4.7
CONFIG_GDB_SCRIPTS
: This option includes a set of gdb scripts that can be used to debug the kernel. Leave CONFIG_DEBUG_INFO_REDUCED
off to get full debugging information.
CONFIG_FRAME_POINTER
: Enable this if your architecture supports it. This greatly improves the reliability of backtraces.
CONFIG_PREEMPT
: This option allows the kernel to be preempted.
CONFIG_DEBUG_KERNEL
CONFIG_KALLSYMS
: This option includes the kernel symbol table in the kernel image.
CONFIG_SPINLOCK_SLEEP
: This option allows spinlocks to sleep, which is useful for debugging.
CONFIG_KGDB
CONFIG_DYNAMIC_DEBUG
: This option allows you to dynamically enable/disable kernel debug messages.
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_VM, etc.
, can be enabled based on the specific areas of the kernel you are interested in.
Disable these options to make debugging easier (comment them out or set them "=n"):
CONFIG_CC_OPTIMIZE_FOR_SIZE
: Disabling this option might be useful as it opts for compiling the kernel with less aggressive optimizations, which makes debugging easier
CONFIG_RANDOMIZE_BASE
: Disabling KASLR (Kernel Address Space Layout Randomization) can make debugging simpler by ensuring that kernel symbols are loaded at consistent addresses between boots.
If you want to add kgdb and kdb support use :
Run make scripts_gdb
to build the gdb scripts (required on kernels v5.1 and above).
Start QEMU with the following command :
-gdb
tcp:: Port to run the gdbserver on
-S
Freeze the CPU on startup (useful for debugging early steps in the kernel)
-kernel
Path to kernel image to debug
-append
Linux kernel command-line parameters. nokaslr
is used to disable KASLR (Kernel Address Space Layout Randomization).
Start gdb: gdb vmlinux
Note: Some distros may restrict auto-loading of gdb scripts to known safe directories. In case gdb reports to refuse loading vmlinux-gdb.py, add:
add-auto-load-safe-path /path/to/linux-build
to~/.gdbinit
.
The Linux kernel provides a set of gdb helpers to make debugging easier. The number of commands and convenience functions may evolve over the time. To list all the available commands, run apropos lx
.
$lx_current()
can pe used to get the current task. Example to get the pid of current task, you can run : p $lx_current().pid
or to get the comm of the current task, you can run : p $lx_current().comm
Make use of the per-cpu function for the current or a specified CPU:
If you’re setting a breakpoint in vfs_open, but only care about the file named test, you might use something like the following:
If you want to examine/chyourange the physical memory you can set the gdbstub to work with the physical memory rather with the virtual one. The memory mode can be checked by sending the following command:
maintenance packet qqemu.PhyMemMode
: This will return either 0 or 1, 1 indicates you are currently in the physical memory mode.
maintenance packet Qqemu.PhyMemMode:1
: This will change the memory mode to physical memory.
maintenance packet Qqemu.PhyMemMode:0
: This will change it back to normal memory mode.
If the kernel hangs on executing a code in the kernel, then the following config options can be used to debug the issue :
After a few seconds (~30s), the kernel generates an Oops and prints the stack trace of the hung task. You can then debug the issue using the same techniques as mentioned in the OOPs Debugging section.
Compile the binary with symbols
Run the binary with valgrind
Valgrind generates a report of the memory leaks and the stack trace of the memory allocation
If kGDB is not available, then KDB can be used to debug the kernel. KDB is a kernel debugger that can be used to debug the kernel in case of a kernel panic or a hung kernel.
QEMU can be used to launch a built kernel and then GDB can be used to attach to the QEMU process and debug the kernel. Remember you should use gdb
when a target hardware is not available. If you have a target hardware, you should use kgdb
instead. See :
Note: If you want to build the kernel for both fuzzing and debugging, include kernel config options from those docs too. Eg - kcov, kasan, kmemleak, etc. Refer to for more details.
To view more details, checkout
For more information checkout
Complete notes on syzkaller are present in the section.
For more information on KDB usage, refer