1. Kdump 工作原理介绍
内核崩溃转储指的是在内核异常情况时,将 RAM 中部分内容进行转储到磁盘或者其他存储中。当内核发生 panic 时,内核依靠 kexec 机制在预先保留的内存区域快速重启一个新的内核实例,预留内存区域大小可通过内核启动参数 crashkernel
指定。
为了实现 “双内核” 布局,Kdump 在内核崩溃后立即使用 kexec 引导到转储捕获内核(capture kernel),使用 kexec 引导 “覆盖” 当前运行的内核。转储捕获内核可以是专门构建的单独 Linux 内核映像,也可以在支持可重定位内核的系统架构上重用主内核映像。
kexec(kernel execution,类似于 Unix 或 Linux 的系统调用 exec)是 Linux 内核的一种机制,其允许从当前运行的内核启动新内核。kexec 会跳过由系统固件(BIOS或UEFI)执行的引导加载程序阶段和硬件初始化阶段,直接将新内核加载到主内存并立即开始执行。这避免了完全重新启动的漫长时间,并且可以通过最小化停机时间来满足系统高可用性要求。
图 1-1 Kdump 原理架构图
Kdump 的功能不仅仅在于分析内核崩溃,在内核学习时,如果我们需了解内核运行状态或结构的详情时,(如果又不想编写内核模块或者使用 gdb 单步调试)也可以使用 Kdump 进行转储,后续使用 Crash 工具对照源码进行分析总结。
2. Ubuntu 20.04 Kdump + Crash 安装
1
2
|
$ sudo apt install linux-crashdump
$ sudo apt install crash
|
安装完成后,需要重启服务器生效。
通过相关文件查看,可得知安装过程中,内核启动参数中的 crashkernel
已经进行了设置。
1
2
3
4
5
6
7
8
9
10
11
|
$ sudo cat /etc/default/grub.d/kdump-tools.cfg
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M"
$ sudo cat /boot/grub.cfg
...
menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-f278a3a6-9739-4b30-b8a1-5de870e7288a' {
...
linux /vmlinuz-5.4.0-80-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
initrd /initrd.img-5.4.0-80-generic
}
...
|
在文件 /boot/grub.cfg
增加了一行 crashkernel
的配置,会根据主机的内存设置预留的 RAM 区域大小。
1
|
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
|
在服务重启成功后,我们可在内核 dmesg
中查看到相关信息,本机保留了 512M RAM 内存区供转储捕获内核使用。同时我们通过命令 kdump-config show
查看到 Kdump
的状态已经 Ready
,service kdump-tools status
显示 kdump-tools
状态为 Active
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
$ sudo reboot
# 查看内核日志
$ dmesg -T | grep -i crash
[Sun Aug 1 01:07:01 2021] crashkernel reserved: 0x00000000dfe00000 - 0x00000000ffe00000 (512 MB)
[Sun Aug 1 01:07:01 2021] Kernel command line: BOOT_IMAGE=/vmlinuz-5.4.0-80-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
$ sudo kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0xdfe00000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.4.0-80-generic
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.4.0-80-generic
current state: ready to kdump # 已经 ready
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-5.4.0-80-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
# 查看启动命令行
$ sudo cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-80-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
# 查看 kdump-tools 服务状态 service --status-all|grep kdump
$ sudo service kdump-tools status
● kdump-tools.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor preset: enabled)
Active: active (exited) since Sat 2021-07-31 13:31:21 UTC; 12min ago
Process: 937 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/SUCCESS)
Main PID: 937 (code=exited, status=0/SUCCESS)
# 查看 crashkernel 内存分配的地址空间
$ cat /proc/iomem | grep -i crash
dfe00000-ffdfffff : Crash kernel
# 查看 crashkernel 内存分配的大小
$ sudo cat /sys/kernel/kexec_crash_size
536870912
|
至此,kdump 服务已生效,在系统遇到崩溃的情况即可生成对应的转储文件,保存目录为 /var/crash
。
Crash
工具为 Red Hat 公司开发用于分析转储文件的工具,等同于对于内核快照进行类似于 gdb
调试的体验。
3. 测试验证
Linux sysrq
工具可手工触发内核 panic,我们可用于临时测试:
1
2
|
$ sudo echo 1 > /proc/sys/kernel/sysrq
$ sudo echo c > /proc/sysrq-trigger
|
命令运行成功后,/var/carsh
目录中会生成了一个以当前日期命名的目录,包含 dmesg.x
和 dump.x
两个文件,其中 demsg.x
为崩溃时候的系统内核日志,dump.x
文件则为转储的内核快照文件。
1
2
3
4
|
$ sudo ls -hl /var/crash/202107311331/
total 86M
-rw------- 1 root root 48K Jul 31 13:31 dmesg.202107311331
-rw------- 1 root root 86M Jul 31 13:31 dump.202107311331
|
为了使用 Crash
工具,我们还需要安装带有调试信息的 vmlinux
文件,安装命令如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
# 设置 repo 仓库
$ echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list
$ sudo apt install ubuntu-dbgsym-keyring
$ sudo apt-get update
$ sudo apt -y install linux-image-$(uname -r)-dbgsym
The following additional packages will be installed:
linux-image-unsigned-5.4.0-80-generic-dbgsym
The following NEW packages will be installed:
linux-image-5.4.0-80-generic-dbgsym linux-image-unsigned-5.4.0-80-generic-dbgsym
0 upgraded, 2 newly installed, 0 to remove and 63 not upgraded.
Need to get 896 MB of archives.
After this operation, 6,225 MB of additional disk space will be used.
Get:1 http://ddebs.ubuntu.com focal-updates/main arm64 linux-image-unsigned-5.4.0-80-generic-dbgsym arm64 5.4.0-80.90 [896 MB]
2% [1 linux-image-unsigned-5.4.0-80-generic-dbgsym 24.8 MB/896 MB 3%]
# 安装完成后,查看文件
$ sudo ls -hl /usr/lib/debug/boot/
total 350M
-rw-r--r-- 1 root root 350M Jul 9 15:49 vmlinux-5.4.0-80-generic
|
在 ubuntu-dbgsym-keyring
包安装成功后,我们可以看到在目录 /usr/lib/debug/boot/
中已经安装了 vmlinux-5.4.0-80-generic
文件。
至此,我们已经万事俱备,可以愉快地使用 Crash
工具进行调试:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
$ sudo crash /usr/lib/debug/boot/vmlinux-5.4.0-80-generic /var/crash/202107311331/dump.202107311331
...
KERNEL: /usr/lib/debug/boot/vmlinux-5.4.0-80-generic
DUMPFILE: /var/crash/202107311331/dump.202107311331 [PARTIAL DUMP]
CPUS: 8
DATE: Sat Jul 31 13:30:59 2021
UPTIME: 00:05:02
LOAD AVERAGE: 0.74, 0.54, 0.25
TASKS: 393
NODENAME: headfirstbpf
RELEASE: 5.4.0-80-generic
VERSION: #90-Ubuntu SMP Fri Jul 9 17:43:26 UTC 2021
MACHINE: aarch64 (unknown Mhz)
MEMORY: 8 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 8139
COMMAND: "bash"
TASK: ffff0001e3d7bc00 [THREAD_INFO: ffff0001e3d7bc00]
CPU: 6
STATE: TASK_RUNNING (PANIC)
# 使用 bt 命令查看崩溃时候的运行栈
crash> bt
PID: 8139 TASK: ffff0001e3d7bc00 CPU: 6 COMMAND: "bash"
#0 [ffff8000140eba00] machine_kexec at ffff8000100aba84
#1 [ffff8000140eba60] __crash_kexec at ffff8000101d4e44
#2 [ffff8000140ebbf0] panic at ffff800010df9c94
#3 [ffff8000140ebcd0] sysrq_handle_crash at ffff80001089a9fc
#4 [ffff8000140ebce0] __handle_sysrq at ffff80001089b3fc
#5 [ffff8000140ebd30] write_sysrq_trigger at ffff80001089babc
#6 [ffff8000140ebd50] proc_reg_write at ffff800010459d74
#7 [ffff8000140ebd90] __vfs_write at ffff8000103974b8
#8 [ffff8000140ebdc0] vfs_write at ffff800010398794
#9 [ffff8000140ebe00] ksys_write at ffff80001039b6a0
#10 [ffff8000140ebe50] __arm64_sys_write at ffff80001039b754
#11 [ffff8000140ebe70] el0_svc_common.constprop.0 at ffff80001009e958
#12 [ffff8000140ebea0] el0_svc_handler at ffff80001009ea9c
#13 [ffff8000140ebff0] el0_svc at ffff80001008464c
PC: 0000ffff80556ed0 LR: 0000ffff8050329c SP: 0000ffffeb06cfd0
X29: 0000ffffeb06cfd0 X28: 0000aaaabf405000 X27: 0000000000000000
X26: 0000aaaabf3cc000 X25: 0000ffff805fe630 X24: 0000000000000002
X23: 0000aaaaeadebaa0 X22: 0000ffff80679710 X21: 0000ffff805fe548
X20: 0000aaaaeadebaa0 X19: 0000000000000001 X18: 0000000000000000
X17: 0000ffff804ffc20 X16: 0000ffff805046a0 X15: 000000007fffffde
X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000000
X11: 0000ffffeb06cf98 X10: 0000000000000001 X9: 00000000ffffff80
X8: 0000000000000040 X7: 0000000000000063 X6: 0000000000000063
X5: 0000000155510004 X4: 000000000000000a X3: 0000ffff80678f10
X2: 0000000000000002 X1: 0000aaaaeadebaa0 X0: 0000000000000001
ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 20001000
|
这里我们输入 bt
命令就可以查看到内核运行崩溃时候的栈。
4. Crash
子命令使用
子命令的运行与 bash
运行类似,可以使用文件重定向、grep/awk 等命令,分析起来非常方便。
具体使用格式,可通过 man subcommand
来了解子命令的详细用法。
bt
用于查看进程的栈和寄存器状态。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
crash> bt 4468
PID: 4468 TASK: ffff0001e7fa4b00 CPU: 2 COMMAND: "containerd"
#0 [ffff800014cf3b30] __switch_to at ffff800010089960
#1 [ffff800014cf3b60] __schedule at ffff800010e0fdc0
#2 [ffff800014cf3bf0] schedule at ffff800010e102c4
#3 [ffff800014cf3c10] futex_wait_queue_me at ffff8000101c0e58
#4 [ffff800014cf3c60] futex_wait at ffff8000101c3540
#5 [ffff800014cf3da0] do_futex at ffff8000101c6868
#6 [ffff800014cf3df0] __arm64_sys_futex at ffff8000101c6aa4
#7 [ffff800014cf3e70] el0_svc_common.constprop.0 at ffff80001009e958
#8 [ffff800014cf3ea0] el0_svc_handler at ffff80001009ea9c
#9 [ffff800014cf3ff0] el0_svc at ffff80001008464c
PC: 0000aaaabcfcdce8 LR: 0000aaaabcf9bb60 SP: 0000ffff873207e0
X29: 0000ffff873207d8 X28: 0000004000000900 X27: 0000aaaabf01af80
X26: 0000aaaabe44e130 X25: 0000ffffe6a26498 X24: 0000000000001000
X23: 0000000000000000 X22: 0000ffffe6a2637f X21: 0000004000076380
X20: 0000ffff87320800 X19: 0000aaaabcfa2928 X18: 0000ffff896e6a70
X17: 0000000000000118 X16: 0000ffff87320898 X15: 0000000000000000
X14: 0000000000000002 X13: 0000000000000001 X12: 00000044c17200e9
X11: 003655cd891a685f X10: 0000000000000018 X9: 000000000001eef0
X8: 0000000000000062 X7: 0000000029aab5ba X6: 0000ffff8976f010
X5: 0000000000000000 X4: 0000000000000000 X3: 0000ffff87320818
X2: 0000000000000000 X1: 0000000000000080 X0: 0000aaaabeffde70
ORIG_X0: 0000aaaabeffde70 SYSCALLNO: 62 PSTATE: 80001000
|
files
files pid
查看指定进程打开文件录详情。
1
2
3
4
5
6
7
|
crash> files 4468
PID: 4468 TASK: ffff0001e7fa4b00 CPU: 2 COMMAND: "containerd"
ROOT: / CWD: /var/snap/var/snap/docker/800
FD FILE DENTRY INODE TYPE PATH
0 ffff0001e644d000 ffff0001f2043a80 ffff0001f166b310 CHR /dev/null
1 ffff0001e40c9e00 ffff000194e75cc0 ffff0001f2e72440 SOCK UNIX
2 ffff0001e40c9e00 ffff000194e75cc0 ffff0001f2e72440 SOCK UNIX
|
task
用于显示 task_struct 结构体。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
crash> task -x 4468
PID: 4468 TASK: ffff0001e7fa4b00 CPU: 2 COMMAND: "containerd"
struct task_struct {
thread_info = {
flags = 0x0,
addr_limit = 0xffffffffffff,
ttbr0 = 0xdff8000226569000,
{
preempt_count = 0x100000000,
preempt = {
count = 0x0,
need_resched = 0x1
}
}
},
state = 0x1,
|
如果只想查看个别子字段,可以使用 -R
来指定,支持逗号分割多个子字段:
1
2
3
4
|
crash> task -x -R files,state 4468
PID: 4468 TASK: ffff0001e7fa4b00 CPU: 2 COMMAND: "containerd"
files = 0xffff0001911d5b80,
state = 0x1,
|
struct
struct
命令可以查看对应结构的详细字段,如果需要查看字段的偏移量添加 -o
参数即可。
1
2
3
4
5
6
|
crash> struct task_struct -o
struct task_struct {
[0] struct thread_info thread_info;
[32] volatile long state;
[40] void *stack;
[48] refcount_t usage;
|
如果明确知道某个地址对应的数据结构,也可通过 struct
来打印:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
crash> struct task_struct ffff0001e7fa4b00
struct task_struct {
thread_info = {
flags = 0,
addr_limit = 281474976710655,
ttbr0 = 16138649273915314176,
{
preempt_count = 4294967296,
preempt = {
count = 0,
need_resched = 1
}
}
},
|
ps
ps
命令查看系统中的全部进程,其中 ST 字段表示状态, RU = “Running”, IN = “Interruptable” UN = “UnInterruptable”, ID = “Idle” 。TASK
字段表示 task_struct
的地址。
1
2
3
4
5
6
7
8
|
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffff800011b82e40 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff0001f10ce900 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff0001f10cda00 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff0001f10cad00 RU 0.0 0 0 [swapper/3]
crash> ps|grep RU # 只显示 RU 状态
|
vm
vm 查看指定进程的虚拟内存。
1
2
3
4
5
6
7
8
9
10
|
crash> vm 4468
PID: 4468 TASK: ffff0001e7fa4b00 CPU: 2 COMMAND: "containerd"
MM PGD RSS TOTAL_VM
ffff0001e7b82940 ffff0001e6569000 44256k 1180404k
VMA START END FLAGS FILE
ffff0001e54636c0 4000000000 4000800000 40100073
ffff0001e369edd0 4000800000 4004000000 100073
ffff0001edf4b450 aaaabc72c000 aaaabdec2000 875 /snap/snap/docker/800/bin/containerd
ffff0001edf4b6c0 aaaabded1000 aaaabef81000 100871 /snap/snap/docker/800/bin/containerd
ffff0001e5463ee0 aaaabef81000 aaaabeff3000 100873 /snap/snap/docker/800/bin/containerd
|
irq
irq
命令查看中断。
1
2
3
4
5
6
7
|
crash> irq
IRQ IRQ_DESC/_DATA IRQACTION NAME
0 (unused) (unused)
1 ffff0001fc4f3000 (unused)
2 ffff0001fc4f0c00 ffff0001fc5f3980 "arch_timer"
3 ffff0001fc4f0400 (unused)
4 ffff0001f13d3e00 ffff0001e4fab100 "uart-pl011"
|
keme
kmem
用于查看系统内存信息。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 1901306 7.3 GB ----
FREE 948095 3.6 GB 49% of TOTAL MEM
USED 953211 3.6 GB 50% of TOTAL MEM
SHARED 323270 1.2 GB 17% of TOTAL MEM
BUFFERS 53277 208.1 MB 2% of TOTAL MEM
CACHED 620919 2.4 GB 32% of TOTAL MEM
SLAB 68520 267.7 MB 3% of TOTAL MEM
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE
TOTAL SWAP 1048575 4 GB ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 1048575 4 GB 100% of TOTAL SWAP
COMMIT LIMIT 1999228 7.6 GB ----
COMMITTED 671869 2.6 GB 33% of TOTAL LIMIT
|
5. 参考
- wiki kdump kernel kdump doc
- Linux Kernel Crash Book
- Ubuntu Kernel Crash Dump
- ubuntu 20.04 启用kdump服务及下载vmlinux
- 在ubuntu上开启kdump-tools服务