SlideShare a Scribd company logo
Troubleshooting Linux Kernel Modules and Device Drivers Mike Anderson Chief Scientist The PTR Group, Inc. [email_address] Source: www.crobike.de
What We’ll Talk About How do errors show up in the kernel? Watching kernel/user-space interaction via  strace Debugging with  printk Using the  /proc  file system Using the  kgdb  debugger Debugging with hardware ala LEDs or a JTAG unit
Challenges of Kernel Debugging There are many features that can make kernel debugging especially difficult Optimizing compilers can rearrange code Instruction pointer seems to jump around The use of the MMU can obfuscate addresses Physical vs. virtual addresses Startup code is particularly difficult to debug because of its closeness to the “metal” No equivalent for the user-space  gdbserver  for drivers Early kernel debugging may require hardware assistance
Device Drivers/Kernel Modules Assuming that your kernel is otherwise working, most of the problems that you’ll encounter are related to device drivers Drivers can either be statically linked or dynamically loaded to the kernel A dynamically loaded driver takes the form of a kernel module Can be dynamically loaded and unloaded at kernel run time Frequently handled by daemons such as udev
Example Module #include <linux/module.h> #include <linux/init.h> #include <linux/kernel.h> MODULE_LICENSE(&quot;GPL and additional rights&quot;); MODULE_AUTHOR( “Driver_Author@someplace.org” ); MODULE_DESCRIPTION( “My first driver!” ); int __init mymodule_init_module(void) { printk(KERN_DEBUG “mymodule_init_module() called, &quot;); return 0; } void __exit mymodule_cleanup_module(void) { printk(KERN_DEBUG “mymodule_cleanup_module() called\n&quot;); } module_init(mymodule_init_module); module_exit(mymodule_cleanup_module);
When Things go Wrong… Problems in device drivers typically manifest themselves in one of three ways Kernel panic Fatal to the system Kernel oops Near fatal to the system Hardware just doesn’t work correctly Could be fatal to you! Source: picasaweb.google.com
Kernel Panic When the Linux kernel determines that a fatal error has occurred, and no recovery is possible, it “panics” Frequently, an exception in an interrupt context Panic outputs a message to the console The output will help you find the source of the bug Typically results in a system reboot on an embedded Linux target Or, blinking keyboard LEDs on some desktop versions of Linux Source: regmedia.co.uk
Sample Panic Output EIP: 0060: [<c02b2516>]  Not tainted VLI EFLAGS: 00010286 (2.6.18) EIP is at nf_queue+0x16/0x240 eax: 00000000 ebx: 00000001 ecx: 00000000 edx: 00000002 esi: dece0920 edi: c03a3aa8 ebp: c0277e70 esp: cf165e78 ds:  007b  es:  007b  ss: 0068 Process sh (pid:5278, threadinfo=cf164000 task=dc5b4ab0) Stack:  cf165f00 dae29000 00000000 c0277e70 00000001 cf165f00 c03a3aa8 c0277e70 c02b1f7c cf165f00 dece0920 00000002 00000001 dae29000 00000000 c0277e70  00000000 dece0920 00000000 da641b40 da54bcde cc318abc c0277c10 00000002 Call Trace: [<c0277e70>] ip_local_deliver_finish+0x0/0x230 [<c0277e70>] ip_local_deliver_finish+0x0/0x230 …  deleted for space … [<c012caa6>] do_softirq+0x26/0x30 [<c010528e>] do_IRQ+0x1e/0x30 [<c0103ada>] common_interrupt+0x1a/0x20 Code: c0 75 ec e9 bd e5 e6 ff 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 57 56 53 83 ec 10 8b 54 24 2c 8b 74 24 28 8b 04 9b c0 42 3a c0 <8b> 38 85 ff 0f 84 96 01 00 00 86 44 24 2c 8b 7c 24 2c c7 44 24 <0>Kernel panic - not syncing: Fatal exception in interrupt
Kernel oops An oops message is displayed when a recoverable error has occurred in kernel space: Access to bad address, e.g., through a NULL pointer Illegal or invalid instruction Etc… The calling user process is killed The system should be considered unstable at this point The oops message displays: The state of the processor at the time of the fault, including registers and address of faulting instruction function call stack traceback The addresses are replaced with symbols if the kallsyms kernel configuration option is selected at kernel compile time
Example Oops Modules linked in: arc4 ecb crypto_blkcipher ieee80211_crypt_wep vfat fat ipt_MASQUERADE iptable_nat nf_nat bridge bnep rfcomm l2cap bluetooth ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi sunrpc ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq fuse loop dm_multipath ipv6 ppdev parport_pc parport video 8139cp output ac battery firewire_ohci fglrx(P) firewire_core crc_itu_t 8139too mii ipw2200 ieee80211 ieee80211_crypt snd_intel8x0m asus_laptop snd_intel8x0 button snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event joydev snd_seq pcspkr serio_raw snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm usb_storage iTCO_wdt snd_timer snd iTCO_vendor_support soundcore snd_page_alloc sr_mod cdrom sg dm_snapshot dm_zero dm_mirror dm_mod pata_acpi ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 5488, comm: Xorg Tainted: P  (2.6.25.3-18.fc9.i686 #1) EIP: 0060:[<c04d038d>] EFLAGS: 00213246 CPU: 0 EIP is at task_has_capability+0x48/0x76 EAX: 00000030 EBX: f6852030 ECX: f6a8af28 EDX: 00000000 ESI: f36e50e0 EDI: f354fec8 EBP: f354fed4 ESP: f354fe84 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process Xorg (pid: 5488, ti=f354f000 task=f35f0e90 task.ti=f354f000) Stack: c06d48b5 f6852030 f35f0e90 00000003 f35f0e90 f6852030 00000000 00000000  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  00000000 f6852030 f35f0e90 f66f8000 f354fee4 c04d03da f35f0e90 f8d52580
Example Oops part 2 Call Trace: [<c04d03da>] ? selinux_capable+0x1f/0x23 [<c04cbfe9>] ? security_capable+0xc/0xe [<c042de97>] ? __capable+0xb/0x22 [<f8c2d560>] ? firegl_version+0x0/0x1b0 [fglrx] [<c042debe>] ? capable+0x10/0x12 [<f8c2d427>] ? firegl_ioctl+0xe7/0x220 [fglrx] [<c062aa49>] ? mutex_lock+0x1d/0x2d [<c04d0c45>] ? file_has_perm+0x7c/0x85 [<f8c24685>] ? ip_firegl_ioctl+0xe/0x10 [fglrx] [<c048d2c8>] ? vfs_ioctl+0x50/0x69 [<c048d51a>] ? do_vfs_ioctl+0x239/0x24c [<c04d0ddb>] ? selinux_file_ioctl+0xa8/0xab [<c048d56d>] ? sys_ioctl+0x40/0x5b [<c0405bf2>] ? syscall_call+0x7/0xb ====================== ======================Code: 00 89 d0 f3 ab 8b 4d b8 89 d8 b2 04 c1 f8 05 c6 45 bc 03 89 5d c4 89 4d c0 74 16 48 b2 45 74 11 53 68 b5 48 6d c0 e8 11 f3 15 00 <0f> 0b 59 5b eb fe 8b 46 08 83 e3 1f 0f b7 f2 8d 55 bc 88 d9 52  EIP: [<c04d038d>] task_has_capability+0x48/0x76 SS:ESP 0068:f354fe84 ---[ end trace 9451402cb10d9c54 ]---
Enabling Symbolic Kernel Output
Module Debugging Techniques Examine the interaction with the kernel via  strace The next line of defense is  printk There may be additional output you’re not seeing Next, we can try adding  /proc  file system entries Instrument the driver for debugging Enable source debugging via kgdb Using hardware debuggers and “blinky lights”
Using  strace  to Watch System Calls When debugging what appears to be a kernel-space error, it can be helpful to watch the system calls that are made from user-space See what events lead to the error strace  displays all system calls made by a program Can display timestamp information per system call as well
Using  strace  to Watch System Calls #2 strace  displays each system call’s arguments and return values string arguments are printed – very helpful! errno values displayed symbolically The program being traced runs normally Not under control of a debugger No need to specially compile the user application You can attach to a running program And trace forked applications as well…
Example strace Output / # strace ls /dev/labdev  execve(&quot;/bin/ls&quot;, [&quot;ls&quot;, &quot;/dev/labdev&quot;], [/* 8 vars */]) = 0 fcntl64(0, F_GETFD)  = 0 fcntl64(1, F_GETFD)  = 0 fcntl64(2, F_GETFD)  = 0 geteuid()  = 0 getuid()  = 0 getegid()  = 0 getgid()  = 0 brk(0)  = 0x1028ad68 brk(0x1028bd68)  = 0x1028bd68 brk(0x1028c000)  = 0x1028c000 ioctl(1, TIOCGWINSZ or TIOCGWINSZ, {ws_row=0, ws_col=0, ws_xpixel=0, ws_ypixel=0 ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0 ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0 lstat(&quot;/dev/labdev&quot;, {st_mode=S_IFCHR|0644, st_rdev=makedev(254, 0), ...}) = 0 open(&quot;/etc/localtime&quot;, O_RDONLY)  = -1 ENOENT (No such file or directory) lstat(&quot;/dev/labdev&quot;, {st_mode=S_IFCHR|0644, st_rdev=makedev(254, 0), ...}) = 0 fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 64), ...}) = 0 ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x300 write(1, &quot;\33[1;35m/dev/labdev\33[0m\n&quot;, 23/dev/labdev ) = 23 munmap(0x30000000, 4096)  = 0 exit(0)  = ?
Debugging with  printk printk  debugging is the debug method preferred by Linus At least, according to his email traffic… Insert messages to be displayed at points of interest in kernel-space code E.g.,  printk(KERN_INFO “my_x= %d\n”, my_x); Printk  works like  printf  does in user-space except that  printk   can only print integers, strings & addresses  printk  can also be called from within ISRs Messages can have “importance” settings that allow filtering Importance is set by prepending a 3-character string to the output message: “< n >”
Debugging with  printk  #2 The messages are placed in a circular buffer that can be retrieved post mortem if needed The “importance” string that is prepended to the  printk  message can be found in  include/linux/kernel.h : #define KERN_EMERG  &quot;<0>&quot;  /* system is unusable */ #define KERN_ALERT  &quot;<1>&quot;  /* action to be taken immediately */ #define KERN_CRIT  &quot;<2>&quot;  /* critical conditions */ #define KERN_ERR  &quot;<3>&quot;  /* error conditions  */ #define KERN_WARNING &quot;<4>&quot;  /* warning conditions  */ #define KERN_NOTICE  &quot;<5>&quot;  /* normal but significant condition */ #define KERN_INFO  &quot;<6>&quot;  /* informational */ #define KERN_DEBUG  &quot;<7>&quot;  /* debug-level messages */
Debugging with  printk  #3 Some level of control for printk output can be found in  /proc/sys/kernel/printk Let’s look at the following output: # cat /proc/sys/kernel/printk 7  4  1  7 This indicates: The  console_loglevel  is 7, so messages with importance of 0..6 will currently go to the console The default message log level is 4, so messages that do not specify an importance are treated as level 4 The minimum console log level is 1, so  console_loglevel  cannot be set to any value less than 1 The default console log level is 7, so  console_loglevel  starts out set to 7
Debugging with  printk  #4 You may control the  console_loglevel  by writing to  /proc/sys/kernel/printk To enable  all  printk  messages with importance levels 0..7: # echo 8 > /proc/sys/kernel/printk If the kernel command line contains the word “debug”, the console_loglevel starts with a value of 10
Using the  /proc  File System Use   /proc  entries for driver instrumentation It is possible to register “write” functions that allow us to dynamically  modify  values in the kernel (or drivers, or modules)  Write values to the  /proc  file system entry Read  /proc  entries allow us to retrieve information from a running kernel entity Information is provided “live” Look at the driver source, there may already be  /proc  entries that can help you
Techniques for Source Debugging The two primary ways to provide source debugging in the Linux kernel are based on either kgdb or on the use of a hardware JTAG probe Unfortunately, kgdb is not a standard feature of the kernel a/o 2.6.25.8 You’ll have to patch your kernel to enable it Either technique will require the use of a kernel image that is compiled with debugging symbols Unless you like debugging in assembly language
Compiling the Kernel with Debug Info This will increase the size of the debug kernel image by about 30% However, you don’t need to load the debug version of the kernel Load the non-debug version to the target, but use the debug version for the debugger/JTAG probe Save off the vmlinux and System.map file because these are used by the debugger or by you to find key addresses The (b)zImage can be loaded on the target as normal
Enabling Debugging in the Kernel
Kernel gdb (kgdb) If you are using a stock kernel, kgdb is not included Linus doesn’t believe in a source debugger in the kernel Many commercial Linux vendors do include it in their distributions though kgdb can be downloaded from: http://kgdb.linsyssoft.com/downloads.htm or  http://sourceforge.net  and look for kgdb You’ll need to patch the kernel A new kgdb light is in the works for 2.6.26 Kgdb over the system console
Enable kgdb in the Kernel
Kgdb Light in 2.6.26-rc8 Uses system console for I/O
kgdb Lash up kgdb supports debugging via the serial port The gdb debugger is running on a second machine using the vmlinux you compiled with debugging symbols You attach to the system being debugged using gdb’s “target remote” command Host Target Network Ethernet RS-232
Hardware-assisted Debugging There are a number of devices that can help with debugging LEDs, JTAGs, logic analyzers, oscilloscopes, bus analyzers and more These can range from a few cents to implement to several 10s of thousands of dollars You typically get what you pay for
Debugging with LEDs Very simple: Blink on/off in various code sections under debug Blink in sequences Can display multiple-bit codes if multiple LEDs are available Very fast, little impact on run-time performance Adding LED debug code will likely not “make the problem go away” These may be the only option for debugging early x86 code
Debugging with LEDs, Caveats The LED(s) must be free for use Not tied in hardware to a network PHY or to displaying power status, for example LED are not very verbose You must decipher what the blinking means Can be difficult to determine where you are in the code You can also attach an oscilloscope to the GPIO pins found on many processors for more information
Hardware Debuggers In the past, in-circuit emulators (ICE) where the debugger of choice You pulled the CPU, plugged the ICE in and plugged the CPU into the ICE But, these where $80K+ each Logic analyzers are also good to have But, they are $35K+ for an empty mainframe PC-based versions can be had for < $1K IEEE 1149.1 (JTAG) has become the debugger da jour JTAG uses a boundary-scan protocol These range from $70 to $20K depending on model and features At a minimum, a JTAG is really a “must-have” for firmware and board bring up
Debugging with a JTAG Probe Debugging with a JTAG unit is much less involved than using kgdb Compile the kernel with debugging enabled No need to patch the kernel for kgdb Assumes your platform supports a JTAG interface You connect to the JTAG unit using whatever technique your JTAG probe requires Your JTAG GUI is dependent on the vendor For those JTAG units that are gdb-aware, use the appropriate  target remote  commands
Example JTAG Usage Connect the JTAG to the target and the host Start the host application to control the JTAG Reset the target and load the register configuration settings to the JTAG unit Load code and enjoy! Useful for debugging drivers as well as bringing up new firmware and BSPs
Summary “ Real developers” use printk – or at least Linus does   Tools like  strace  allow you see the flow of execution The  /proc  filesystem gives you a window into the kernel/drivers Kgdb uses familiar gdb technology but at the kernel level LEDs are a fast and easy way to get info out of the machine as well Hardware JTAG debug tools may be available These can be invaluable if you can get one

More Related Content

PDF
Uboot startup sequence
PPT
U Boot or Universal Bootloader
PDF
U-Boot - An universal bootloader
PPTX
Linux Memory Management
PDF
Linux kernel architecture
PDF
PDF
Linux kernel debugging
PDF
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Uboot startup sequence
U Boot or Universal Bootloader
U-Boot - An universal bootloader
Linux Memory Management
Linux kernel architecture
Linux kernel debugging
SFO15-TR9: PSCI, ACPI (and UEFI to boot)

What's hot (20)

PDF
LCU14 500 ARM Trusted Firmware
PDF
Part 01 Linux Kernel Compilation (Ubuntu)
PDF
Embedded linux network device driver development
PPTX
U-Boot Porting on New Hardware
PPTX
Linux Kernel Tour
PDF
LCU13: An Introduction to ARM Trusted Firmware
PDF
Making Linux do Hard Real-time
PDF
LISA2019 Linux Systems Performance
PDF
PDF
Making Linux do Hard Real-time
PDF
Page cache in Linux kernel
PDF
Virtualization Support in ARMv8+
PDF
Understanding a kernel oops and a kernel panic
PPT
U boot porting guide for SoC
PDF
SFO15-200: Linux kernel generic TEE driver
PPT
Process scheduling linux
PPT
Linux monitoring and Troubleshooting for DBA's
PDF
Bootloaders
PDF
BeagleBone Black Bootloaders
PDF
Linux Performance Analysis: New Tools and Old Secrets
LCU14 500 ARM Trusted Firmware
Part 01 Linux Kernel Compilation (Ubuntu)
Embedded linux network device driver development
U-Boot Porting on New Hardware
Linux Kernel Tour
LCU13: An Introduction to ARM Trusted Firmware
Making Linux do Hard Real-time
LISA2019 Linux Systems Performance
Making Linux do Hard Real-time
Page cache in Linux kernel
Virtualization Support in ARMv8+
Understanding a kernel oops and a kernel panic
U boot porting guide for SoC
SFO15-200: Linux kernel generic TEE driver
Process scheduling linux
Linux monitoring and Troubleshooting for DBA's
Bootloaders
BeagleBone Black Bootloaders
Linux Performance Analysis: New Tools and Old Secrets
Ad

Viewers also liked (20)

PPT
Linux Crash Dump Capture and Analysis
PDF
17290770 Setting Buat Router Mikrotik
PPTX
Linux Kernel Module - For NLKB
PDF
Debugging linux kernel tools and techniques
PDF
Linux kernel tracing
PDF
Kernel Recipes 2015: Kernel packet capture technologies
PDF
Linux Kernel Startup Code In Embedded Linux
PDF
Instalasi dan-konfigurasi-mikrotik-sebagai-gateway
DOC
Panduan Setting DialUp Speedy dengan MikroTik
PDF
Instalasi mikrotik-router os
DOC
Tutorial mikrotik-step-by-step
PPT
Systemtap
PPT
Linux introduction
PDF
Perbaikan dan_atau_setting_ulang_koneksi_jaringan
PDF
Menginstalasi Sistem Operasi Berbasis Text
PDF
Panduan Pengguna Linux RedHat
PPTX
Linux Kernel Booting Process (2) - For NLKB
PDF
Kernel Module Programming
PDF
Linux Kernel and Driver Development Training
PPT
Basic socket programming
Linux Crash Dump Capture and Analysis
17290770 Setting Buat Router Mikrotik
Linux Kernel Module - For NLKB
Debugging linux kernel tools and techniques
Linux kernel tracing
Kernel Recipes 2015: Kernel packet capture technologies
Linux Kernel Startup Code In Embedded Linux
Instalasi dan-konfigurasi-mikrotik-sebagai-gateway
Panduan Setting DialUp Speedy dengan MikroTik
Instalasi mikrotik-router os
Tutorial mikrotik-step-by-step
Systemtap
Linux introduction
Perbaikan dan_atau_setting_ulang_koneksi_jaringan
Menginstalasi Sistem Operasi Berbasis Text
Panduan Pengguna Linux RedHat
Linux Kernel Booting Process (2) - For NLKB
Kernel Module Programming
Linux Kernel and Driver Development Training
Basic socket programming
Ad

Similar to Troubleshooting Linux Kernel Modules And Device Drivers (20)

PDF
HKG18-TR14 - Postmortem Debugging with Coresight
PPT
Writing Metasploit Plugins
PDF
The true story_of_hello_world
PPTX
Techno-Fest-15nov16
PPT
Linux Device Driver for Writing a real world driver for embedded Linux
PPTX
Track c-High speed transaction-based hw-sw coverification -eve
PDF
DPDK & Layer 4 Packet Processing
PDF
Shellcoding in linux
PPTX
A New Framework for Detection
PDF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
PDF
Open Source Systems Performance
PPT
Virtual platform
PPTX
Linux Improvements in Memory Corruption Based Protections
PDF
Reverse engineering Swisscom's Centro Grande Modem
PDF
Crash_Report_Mechanism_In_Tizen
PDF
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
PPTX
Analisis_avanzado_vmware
PPTX
Advanced Root Cause Analysis
PDF
The Spectre of Meltdowns
HKG18-TR14 - Postmortem Debugging with Coresight
Writing Metasploit Plugins
The true story_of_hello_world
Techno-Fest-15nov16
Linux Device Driver for Writing a real world driver for embedded Linux
Track c-High speed transaction-based hw-sw coverification -eve
DPDK & Layer 4 Packet Processing
Shellcoding in linux
A New Framework for Detection
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Open Source Systems Performance
Virtual platform
Linux Improvements in Memory Corruption Based Protections
Reverse engineering Swisscom's Centro Grande Modem
Crash_Report_Mechanism_In_Tizen
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
Analisis_avanzado_vmware
Advanced Root Cause Analysis
The Spectre of Meltdowns

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Group 1 Presentation -Planning and Decision Making .pptx
Digital-Transformation-Roadmap-for-Companies.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Programs and apps: productivity, graphics, security and other tools
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectroscopy.pptx food analysis technology
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...

Troubleshooting Linux Kernel Modules And Device Drivers

  • 1. Troubleshooting Linux Kernel Modules and Device Drivers Mike Anderson Chief Scientist The PTR Group, Inc. [email_address] Source: www.crobike.de
  • 2. What We’ll Talk About How do errors show up in the kernel? Watching kernel/user-space interaction via strace Debugging with printk Using the /proc file system Using the kgdb debugger Debugging with hardware ala LEDs or a JTAG unit
  • 3. Challenges of Kernel Debugging There are many features that can make kernel debugging especially difficult Optimizing compilers can rearrange code Instruction pointer seems to jump around The use of the MMU can obfuscate addresses Physical vs. virtual addresses Startup code is particularly difficult to debug because of its closeness to the “metal” No equivalent for the user-space gdbserver for drivers Early kernel debugging may require hardware assistance
  • 4. Device Drivers/Kernel Modules Assuming that your kernel is otherwise working, most of the problems that you’ll encounter are related to device drivers Drivers can either be statically linked or dynamically loaded to the kernel A dynamically loaded driver takes the form of a kernel module Can be dynamically loaded and unloaded at kernel run time Frequently handled by daemons such as udev
  • 5. Example Module #include <linux/module.h> #include <linux/init.h> #include <linux/kernel.h> MODULE_LICENSE(&quot;GPL and additional rights&quot;); MODULE_AUTHOR( “[email protected]” ); MODULE_DESCRIPTION( “My first driver!” ); int __init mymodule_init_module(void) { printk(KERN_DEBUG “mymodule_init_module() called, &quot;); return 0; } void __exit mymodule_cleanup_module(void) { printk(KERN_DEBUG “mymodule_cleanup_module() called\n&quot;); } module_init(mymodule_init_module); module_exit(mymodule_cleanup_module);
  • 6. When Things go Wrong… Problems in device drivers typically manifest themselves in one of three ways Kernel panic Fatal to the system Kernel oops Near fatal to the system Hardware just doesn’t work correctly Could be fatal to you! Source: picasaweb.google.com
  • 7. Kernel Panic When the Linux kernel determines that a fatal error has occurred, and no recovery is possible, it “panics” Frequently, an exception in an interrupt context Panic outputs a message to the console The output will help you find the source of the bug Typically results in a system reboot on an embedded Linux target Or, blinking keyboard LEDs on some desktop versions of Linux Source: regmedia.co.uk
  • 8. Sample Panic Output EIP: 0060: [<c02b2516>] Not tainted VLI EFLAGS: 00010286 (2.6.18) EIP is at nf_queue+0x16/0x240 eax: 00000000 ebx: 00000001 ecx: 00000000 edx: 00000002 esi: dece0920 edi: c03a3aa8 ebp: c0277e70 esp: cf165e78 ds: 007b es: 007b ss: 0068 Process sh (pid:5278, threadinfo=cf164000 task=dc5b4ab0) Stack: cf165f00 dae29000 00000000 c0277e70 00000001 cf165f00 c03a3aa8 c0277e70 c02b1f7c cf165f00 dece0920 00000002 00000001 dae29000 00000000 c0277e70 00000000 dece0920 00000000 da641b40 da54bcde cc318abc c0277c10 00000002 Call Trace: [<c0277e70>] ip_local_deliver_finish+0x0/0x230 [<c0277e70>] ip_local_deliver_finish+0x0/0x230 … deleted for space … [<c012caa6>] do_softirq+0x26/0x30 [<c010528e>] do_IRQ+0x1e/0x30 [<c0103ada>] common_interrupt+0x1a/0x20 Code: c0 75 ec e9 bd e5 e6 ff 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 57 56 53 83 ec 10 8b 54 24 2c 8b 74 24 28 8b 04 9b c0 42 3a c0 <8b> 38 85 ff 0f 84 96 01 00 00 86 44 24 2c 8b 7c 24 2c c7 44 24 <0>Kernel panic - not syncing: Fatal exception in interrupt
  • 9. Kernel oops An oops message is displayed when a recoverable error has occurred in kernel space: Access to bad address, e.g., through a NULL pointer Illegal or invalid instruction Etc… The calling user process is killed The system should be considered unstable at this point The oops message displays: The state of the processor at the time of the fault, including registers and address of faulting instruction function call stack traceback The addresses are replaced with symbols if the kallsyms kernel configuration option is selected at kernel compile time
  • 10. Example Oops Modules linked in: arc4 ecb crypto_blkcipher ieee80211_crypt_wep vfat fat ipt_MASQUERADE iptable_nat nf_nat bridge bnep rfcomm l2cap bluetooth ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi sunrpc ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq fuse loop dm_multipath ipv6 ppdev parport_pc parport video 8139cp output ac battery firewire_ohci fglrx(P) firewire_core crc_itu_t 8139too mii ipw2200 ieee80211 ieee80211_crypt snd_intel8x0m asus_laptop snd_intel8x0 button snd_seq_dummy snd_ac97_codec ac97_bus snd_seq_oss snd_seq_midi_event joydev snd_seq pcspkr serio_raw snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm usb_storage iTCO_wdt snd_timer snd iTCO_vendor_support soundcore snd_page_alloc sr_mod cdrom sg dm_snapshot dm_zero dm_mirror dm_mod pata_acpi ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 5488, comm: Xorg Tainted: P (2.6.25.3-18.fc9.i686 #1) EIP: 0060:[<c04d038d>] EFLAGS: 00213246 CPU: 0 EIP is at task_has_capability+0x48/0x76 EAX: 00000030 EBX: f6852030 ECX: f6a8af28 EDX: 00000000 ESI: f36e50e0 EDI: f354fec8 EBP: f354fed4 ESP: f354fe84 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process Xorg (pid: 5488, ti=f354f000 task=f35f0e90 task.ti=f354f000) Stack: c06d48b5 f6852030 f35f0e90 00000003 f35f0e90 f6852030 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 f6852030 f35f0e90 f66f8000 f354fee4 c04d03da f35f0e90 f8d52580
  • 11. Example Oops part 2 Call Trace: [<c04d03da>] ? selinux_capable+0x1f/0x23 [<c04cbfe9>] ? security_capable+0xc/0xe [<c042de97>] ? __capable+0xb/0x22 [<f8c2d560>] ? firegl_version+0x0/0x1b0 [fglrx] [<c042debe>] ? capable+0x10/0x12 [<f8c2d427>] ? firegl_ioctl+0xe7/0x220 [fglrx] [<c062aa49>] ? mutex_lock+0x1d/0x2d [<c04d0c45>] ? file_has_perm+0x7c/0x85 [<f8c24685>] ? ip_firegl_ioctl+0xe/0x10 [fglrx] [<c048d2c8>] ? vfs_ioctl+0x50/0x69 [<c048d51a>] ? do_vfs_ioctl+0x239/0x24c [<c04d0ddb>] ? selinux_file_ioctl+0xa8/0xab [<c048d56d>] ? sys_ioctl+0x40/0x5b [<c0405bf2>] ? syscall_call+0x7/0xb ====================== ======================Code: 00 89 d0 f3 ab 8b 4d b8 89 d8 b2 04 c1 f8 05 c6 45 bc 03 89 5d c4 89 4d c0 74 16 48 b2 45 74 11 53 68 b5 48 6d c0 e8 11 f3 15 00 <0f> 0b 59 5b eb fe 8b 46 08 83 e3 1f 0f b7 f2 8d 55 bc 88 d9 52 EIP: [<c04d038d>] task_has_capability+0x48/0x76 SS:ESP 0068:f354fe84 ---[ end trace 9451402cb10d9c54 ]---
  • 13. Module Debugging Techniques Examine the interaction with the kernel via strace The next line of defense is printk There may be additional output you’re not seeing Next, we can try adding /proc file system entries Instrument the driver for debugging Enable source debugging via kgdb Using hardware debuggers and “blinky lights”
  • 14. Using strace to Watch System Calls When debugging what appears to be a kernel-space error, it can be helpful to watch the system calls that are made from user-space See what events lead to the error strace displays all system calls made by a program Can display timestamp information per system call as well
  • 15. Using strace to Watch System Calls #2 strace displays each system call’s arguments and return values string arguments are printed – very helpful! errno values displayed symbolically The program being traced runs normally Not under control of a debugger No need to specially compile the user application You can attach to a running program And trace forked applications as well…
  • 16. Example strace Output / # strace ls /dev/labdev execve(&quot;/bin/ls&quot;, [&quot;ls&quot;, &quot;/dev/labdev&quot;], [/* 8 vars */]) = 0 fcntl64(0, F_GETFD) = 0 fcntl64(1, F_GETFD) = 0 fcntl64(2, F_GETFD) = 0 geteuid() = 0 getuid() = 0 getegid() = 0 getgid() = 0 brk(0) = 0x1028ad68 brk(0x1028bd68) = 0x1028bd68 brk(0x1028c000) = 0x1028c000 ioctl(1, TIOCGWINSZ or TIOCGWINSZ, {ws_row=0, ws_col=0, ws_xpixel=0, ws_ypixel=0 ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0 ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0 lstat(&quot;/dev/labdev&quot;, {st_mode=S_IFCHR|0644, st_rdev=makedev(254, 0), ...}) = 0 open(&quot;/etc/localtime&quot;, O_RDONLY) = -1 ENOENT (No such file or directory) lstat(&quot;/dev/labdev&quot;, {st_mode=S_IFCHR|0644, st_rdev=makedev(254, 0), ...}) = 0 fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 64), ...}) = 0 ioctl(1, TCGETS or TCGETS, {B9600 opost isig icanon echo ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x300 write(1, &quot;\33[1;35m/dev/labdev\33[0m\n&quot;, 23/dev/labdev ) = 23 munmap(0x30000000, 4096) = 0 exit(0) = ?
  • 17. Debugging with printk printk debugging is the debug method preferred by Linus At least, according to his email traffic… Insert messages to be displayed at points of interest in kernel-space code E.g., printk(KERN_INFO “my_x= %d\n”, my_x); Printk works like printf does in user-space except that printk can only print integers, strings & addresses printk can also be called from within ISRs Messages can have “importance” settings that allow filtering Importance is set by prepending a 3-character string to the output message: “< n >”
  • 18. Debugging with printk #2 The messages are placed in a circular buffer that can be retrieved post mortem if needed The “importance” string that is prepended to the printk message can be found in include/linux/kernel.h : #define KERN_EMERG &quot;<0>&quot; /* system is unusable */ #define KERN_ALERT &quot;<1>&quot; /* action to be taken immediately */ #define KERN_CRIT &quot;<2>&quot; /* critical conditions */ #define KERN_ERR &quot;<3>&quot; /* error conditions */ #define KERN_WARNING &quot;<4>&quot; /* warning conditions */ #define KERN_NOTICE &quot;<5>&quot; /* normal but significant condition */ #define KERN_INFO &quot;<6>&quot; /* informational */ #define KERN_DEBUG &quot;<7>&quot; /* debug-level messages */
  • 19. Debugging with printk #3 Some level of control for printk output can be found in /proc/sys/kernel/printk Let’s look at the following output: # cat /proc/sys/kernel/printk 7 4 1 7 This indicates: The console_loglevel is 7, so messages with importance of 0..6 will currently go to the console The default message log level is 4, so messages that do not specify an importance are treated as level 4 The minimum console log level is 1, so console_loglevel cannot be set to any value less than 1 The default console log level is 7, so console_loglevel starts out set to 7
  • 20. Debugging with printk #4 You may control the console_loglevel by writing to /proc/sys/kernel/printk To enable all printk messages with importance levels 0..7: # echo 8 > /proc/sys/kernel/printk If the kernel command line contains the word “debug”, the console_loglevel starts with a value of 10
  • 21. Using the /proc File System Use /proc entries for driver instrumentation It is possible to register “write” functions that allow us to dynamically modify values in the kernel (or drivers, or modules) Write values to the /proc file system entry Read /proc entries allow us to retrieve information from a running kernel entity Information is provided “live” Look at the driver source, there may already be /proc entries that can help you
  • 22. Techniques for Source Debugging The two primary ways to provide source debugging in the Linux kernel are based on either kgdb or on the use of a hardware JTAG probe Unfortunately, kgdb is not a standard feature of the kernel a/o 2.6.25.8 You’ll have to patch your kernel to enable it Either technique will require the use of a kernel image that is compiled with debugging symbols Unless you like debugging in assembly language
  • 23. Compiling the Kernel with Debug Info This will increase the size of the debug kernel image by about 30% However, you don’t need to load the debug version of the kernel Load the non-debug version to the target, but use the debug version for the debugger/JTAG probe Save off the vmlinux and System.map file because these are used by the debugger or by you to find key addresses The (b)zImage can be loaded on the target as normal
  • 24. Enabling Debugging in the Kernel
  • 25. Kernel gdb (kgdb) If you are using a stock kernel, kgdb is not included Linus doesn’t believe in a source debugger in the kernel Many commercial Linux vendors do include it in their distributions though kgdb can be downloaded from: http://kgdb.linsyssoft.com/downloads.htm or http://sourceforge.net and look for kgdb You’ll need to patch the kernel A new kgdb light is in the works for 2.6.26 Kgdb over the system console
  • 26. Enable kgdb in the Kernel
  • 27. Kgdb Light in 2.6.26-rc8 Uses system console for I/O
  • 28. kgdb Lash up kgdb supports debugging via the serial port The gdb debugger is running on a second machine using the vmlinux you compiled with debugging symbols You attach to the system being debugged using gdb’s “target remote” command Host Target Network Ethernet RS-232
  • 29. Hardware-assisted Debugging There are a number of devices that can help with debugging LEDs, JTAGs, logic analyzers, oscilloscopes, bus analyzers and more These can range from a few cents to implement to several 10s of thousands of dollars You typically get what you pay for
  • 30. Debugging with LEDs Very simple: Blink on/off in various code sections under debug Blink in sequences Can display multiple-bit codes if multiple LEDs are available Very fast, little impact on run-time performance Adding LED debug code will likely not “make the problem go away” These may be the only option for debugging early x86 code
  • 31. Debugging with LEDs, Caveats The LED(s) must be free for use Not tied in hardware to a network PHY or to displaying power status, for example LED are not very verbose You must decipher what the blinking means Can be difficult to determine where you are in the code You can also attach an oscilloscope to the GPIO pins found on many processors for more information
  • 32. Hardware Debuggers In the past, in-circuit emulators (ICE) where the debugger of choice You pulled the CPU, plugged the ICE in and plugged the CPU into the ICE But, these where $80K+ each Logic analyzers are also good to have But, they are $35K+ for an empty mainframe PC-based versions can be had for < $1K IEEE 1149.1 (JTAG) has become the debugger da jour JTAG uses a boundary-scan protocol These range from $70 to $20K depending on model and features At a minimum, a JTAG is really a “must-have” for firmware and board bring up
  • 33. Debugging with a JTAG Probe Debugging with a JTAG unit is much less involved than using kgdb Compile the kernel with debugging enabled No need to patch the kernel for kgdb Assumes your platform supports a JTAG interface You connect to the JTAG unit using whatever technique your JTAG probe requires Your JTAG GUI is dependent on the vendor For those JTAG units that are gdb-aware, use the appropriate target remote commands
  • 34. Example JTAG Usage Connect the JTAG to the target and the host Start the host application to control the JTAG Reset the target and load the register configuration settings to the JTAG unit Load code and enjoy! Useful for debugging drivers as well as bringing up new firmware and BSPs
  • 35. Summary “ Real developers” use printk – or at least Linus does  Tools like strace allow you see the flow of execution The /proc filesystem gives you a window into the kernel/drivers Kgdb uses familiar gdb technology but at the kernel level LEDs are a fast and easy way to get info out of the machine as well Hardware JTAG debug tools may be available These can be invaluable if you can get one