Advanced Tips for Writing BPF Applications with libbpf
Link to this article: https://www.ebpf.top/post/top_and_tricks_for_bpf_libbpf
Original article: https://www.pingcap.com/blog/tips-and-tricks-for-writing-linux-bpf-applications-with-libbpf/
In the early days of 2020, when I was using the BCC tool to analyze our database’s performance bottlenecks and pulled code from GitHub, I unexpectedly found an extra libbpf-tools directory in the BCC project. I studied the articles on BPF Portability and BCC to libbpf Conversion and transformed the previously submitted bcc-tools into libbpf-tools based on the knowledge I had gained. Finally, I completed the conversion work for nearly 20 tools (see Why We Switched from BCC-Tools to libbpf-Tools for BPF Performance Analysis).
During this process, I was fortunate to receive a lot of help from Andrii Nakryiko (the person in charge of the libbpf + BPF CO-RE project). It was an interesting experience, and I learned a lot. In this article, I will share the experience I gained from using libbpf to write BPF programs. I hope this article will be helpful to those who are interested in libbpf and will help them further develop and improve their BPF applications using libbpf.
However, before continuing reading, it is recommended to read these articles to obtain important background information:
- BPF Portability and CO-RE
- HOWTO: BCC to libbpf Conversion
- Building BPF Applications with libbpf-boostrap
This article assumes that you have already read the above articles, so there will be no systematic description here. Instead, I will provide corresponding tips for certain details of the program.
Program Framework (Skeleton)
Combine the Open and Loader Stages
If the BPF code you write does not require any runtime adjustments, such as adjusting the size of maps or setting additional configurations, you can call <name>__open_and_load()
to combine the two stages. This will make our code look more concise. For example:
|
|
You can view the complete code sample in readahead.c. This pull request has adjusted subsequent versions.
Selective Attachment (Attach)
By default, <name>__attach()
will attach all BPF programs that can automatically be attached. However, sometimes you may want to selectively attach the corresponding BPF programs based on command line parameters. In this case, you can choose to actively call the bpf_program__attach()
function. For example:
|
|
You can see the complete code example in biolatency.c [init biolatency.c].
Custom load and attach
The framework is suitable for almost all scenarios, but there is one special case: performance events (perf events). In this case, you don’t need to use link
in struct <name>__bpf
, but you need to define an array structure: struct bpf_link *links[]
. This is because perf_event
needs to be opened separately on each CPU.
Then, you need to open
and attach
perf_event
manually:
|
|
Finally, in the cleaning phase, remember to destroy each link
in links
, and then destroy links
itself.
You can see the complete code in runqlen.c.
Multiple BPF handlers for the same event
Starting from v0.2, libbpf supports having multiple entry BPF programs in the same executable and linkable format (ELF) section. Therefore, you can attach multiple BPF programs to the same event (e.g., tracepoints or kprobes) without worrying about ELF section name conflicts. For more information, see Add libbpf full support for BPF-to-BPF calls. Now, you can naturally define multiple handlers to process events like this:
|
|
You can see the complete code in hardirqs.bpf.c (the code is built on libbpf-bootstrap). [Note: this file no longer exists]
If you are using a libbpf version earlier than v2.0 and want to define multiple handlers for an event, you must use multiple program types, for example:
|
|
You can see the complete code in hardirqs.bpf.c.
Map
Reduce pre-allocation overhead
Note: https://github.com/iovisor/bcc/pull/4044 is this parameter triggering deadlock? Have it been removed?
Using hash maps with BPF_F_NO_PREALLOC flag triggers a warning ( 0), and according to kernel commit 94dacdbd5d2d, this may cause deadlocks. Remove the flag from libbpf tools.]
Starting from Linux 4.6, BPF hash maps will preallocate memory by default and introduce the BPF_F_NO_PREALLOC
flag. The motivation behind this is to avoid kprobe + bpf deadlocks. The community has tried other solutions, but ultimately, preallocating all map elements is the simplest solution and does not affect user space behavior.
When it is too expensive to fully preallocate the map, you can define the map with the BPF_F_NO_PREALLOC
flag to maintain the previous behavior. For more details, please refer to bpf: map pre-alloc. This flag is unnecessary when the map size is small (e.g., MAX_ENTRIES
= 256) because BPF_F_NO_PREALLOC
is slower.
Here is an example of usage:
|
|
You can see more examples in libbpf-tools.
Determining Map Size at Runtime
One advantage of libbpf-tools is portability, so the maximum space required by the map may vary depending on the machine. In this case, you can define the map without specifying the size before loading and adjust it at runtime. For example:
In <name>.bpf.c
, define the map:
|
|
After the open
stage, call bpf_map__resize()
to dynamically adjust it. For example:
|
|
You can see the complete code in cpudist.c. [Has the latest code been adjusted through bpf_map__set_max_entries?]
Per-CPU
When choosing the map type, if multiple events occur with the same CPU, you can use per-CPU arrays to track timestamps, which is simpler and more efficient than using a hash map. However, you must ensure that the kernel does not migrate the process from one CPU to another between two BPF program calls. Therefore, you cannot always use this trick. The following example analyzes soft interrupts and meets these two conditions:
|
|
You can see the complete code in softirqs.bpf.c.
Global Variables
Not only can you use global variables to customize the logic of BPF programs, but you can also use them instead of maps, making the program simpler and more efficient. Global variables can be of any size. You can set a fixed size for global variables.
For example, because the number of SOFTIRQ types is fixed, you can define a global array in softirq.bpf.c
to store counts and histograms:
|
|
Then, you can directly iterate over this array in user space:
|
|
You can find the complete code in softirqs.c.
Note on accessing fields directly through pointers
As you may have learned in the article BPF Portability and CO-RE, the combination of libbpf + BPF_PROG_TYPE_TRACING
provides a basis for the BPF verifier. The verifier is able to understand and track BTF natively, allowing you to safely trace pointers and read kernel memory directly. For example:
|
|
This is really cool to use. However, when you use such expressions in conditional statements, bugs can be introduced due to branch optimization in certain kernel versions. In this case, until bpf: fix an incorrect branch elimination by verifier was widely introduced, use BPF_CORE_READ
to ensure kernel compatibility. You can find an example in biolatency.bpf.c:
|
|
As you can see, even though it’s a tp_btf
program and q->elevator
would be faster, I still used BPF_CORE_READ(q, elevator)
.
Conclusion
This article introduced some tricks for writing BPF programs using libbpf. You can find many practical examples in libbpf-tools and bpf. If you have any questions, feel free to join the TiDB community on Slack and send us your feedback.
- Author: DavidDi
- Link: https://www.ebpf.top/en/post/top_and_tricks_for_bpf_libbpf/
- License: This work is under a Attribution-NonCommercial-NoDerivs 4.0 International. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.
- Last Modified Time: 2024-02-04 13:17:14.581567217 +0800 CST