Changes in Process Capabilities Using eBPF
This article is available at: https://www.ebpf.top/post/bpf_capabilities_debug
Author: kira skyler
Introduction
In the Linux operating system, “capabilities” are a permission mechanism used for all privileges in the Linux system, finely divided into multiple independent permission bits. This way, users or processes can be granted only the specific permissions needed to perform specific tasks, without requiring all permissions.
In the Linux capabilities system, permission assignments are divided into different sets, such as the Inheritable set, Permitted set, Effective set, Bounding set, and Ambient set. Each set controls the permissions of processes or threads in different scenarios. These capabilities may change under different circumstances, such as switching users, where a new user may likely have a different set of capabilities, and these sets may change according to different rules when creating child processes or executing new programs.
Example: Granting a user the cap_chown capability allows them to change the owner of a file. For example, only a user with this capability can freely designate the owner of a file in the system to another user or user group.
I once encountered an issue related to capabilities when troubleshooting a custom operating system for a company. The operations team reported that root was unable to use tcpdump, throwing the error tcpdump: Couldn't change ownership of savefile
.
When using tcpdump in the command line, indeed an error would occur:
|
|
First, let’s use strace to see at which point in the execution tcpdump throws an error. It turns out that the chown system call is returning an error due to permission denied when trying to change the user owner to 72, which is the uid and gid of the tcpdump user in my operating system. Apparently, tcpdump changes the owner of the file when specifying the output file.
|
|
In cases of exceptions returned by system calls, I often use ftrace to trace the calling path in the kernel. Following this path to find where in the kernel the EPERM
error is happening is a different topic. This was the first time I faced a problem related to capabilities.
|
|
After searching through a search engine, it was confirmed that the current terminal did not have the cap_chown
capability. Further searches revealed that in this system, root was customized without the cap_chown
capability.
|
|
Unleash the Power of eBPF to Track Capability Changes
When you want to trace how the capabilities of a process change and how they are passed, traditional tools almost cannot do it. It is known that modifying capabilities must be triggered by system calls, whether executing another program with execve or switching users with setuid since almost all interactions between the application layer and the kernel in Linux are through system calls.
Developing a tool similar to strace with ptrace to trace all process system calls would severely impact machine performance. Ptrace performance is extremely poor, causing programs to slow down by tens to hundreds of times. Moreover, it needs to frequently read attributes from /proc
to obtain process capabilities. Additionally, there is a process attribute like securebits
that cannot be obtained through /proc/pid/status
nor by ptrace in my 5.10 kernel.
eBPF has significant advantages:
-
It can trace all system calls by tracking the
tracepoint/raw_syscalls/sys_enter
andtracepoint/raw_syscalls/sys_exit
raw tracepoints for system calls, without manually listing each system call, thus preventing changes due to different kernel versions. It has much lower performance overhead compared to ptrace. -
It allows eBPF programs to access the current process’s
task_struct
, which contains almost all information about the process. It is almost like having the Sword of Damocles. This way, real-time process capability information andsecurebits
can be obtained.
Let’s take a look at cap.bpf.h
. First, there is the s_filter
structure used to filter pid
and uid
during tracing, followed by definitions of parameters for multiple system calls. These parameters will be collected when entering the system call. Lastly, the s_event
structure will pass the collected information to the user space, including basic process properties, user-space stack, where stack information will show how the code was called here, cap_before
and cap_after
representing capability changes before and after the system call.
In cap.bpf.c
, part of the libbpf-core code, the default filtering values for pid
and uid
are set to -1 to indicate no filtering. If filtering is needed, these two values can be directly modified in the user-level code.
- Author: kira skyler
- Link: https://www.ebpf.top/en/post/bpf_capabilities_debug/
- License: This work is under a Attribution-NonCommercial-NoDerivs 4.0 International. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.
- Last Modified Time: 2024-09-16 10:43:32.474716358 +0800 CST