This article is available at: https://www.ebpf.top/post/bpf_capabilities_debug
Author: kira skyler
Introduction Unleash the Power of eBPF to Track Capability Changes Introduction In the Linux operating system, “capabilities” are a permission mechanism used for all privileges in the Linux system, finely divided into multiple independent permission bits. This way, users or processes can be granted only the specific permissions needed to perform specific tasks, without requiring all permissions.
In the Linux capabilities system, permission assignments are divided into different sets, such as the Inheritable set, Permitted set, Effective set, Bounding set, and Ambient set. Each set controls the permissions of processes or threads in different scenarios. These capabilities may change under different circumstances, such as switching users, where a new user may likely have a different set of capabilities, and these sets may change according to different rules when creating child processes or executing new programs.……
This article can be found at: https://www.ebpf.top/post/bpf_sched_ext_dive_into
Linux Process Scheduler CFS Scheduler EEVDF Scheduler Implementation Mechanism of BPF Scheduler Extender sched_ext Addition 1: SCHED_EXT Scheduling Class Addition 2: eBPF Custom Scheduler Functions Workflow of SCHED_EXT Scheduling Class Scheduling Cycle Workflow Switching to sched_ext Summary In the article Linus Strongly Pushes for Inclusion: BPF Empowers Scheduler Success, we reviewed the journey of BPF in the scheduler integration process within the community. Patch V7 is prepared for merging into 6.11, and subsequently, the code repository has also changed to kernel git address. It is only a matter of time for the merge to happen. This blog post will focus on the implementation principles of sched_ext. sched_ext is an extensible scheduler class jointly introduced by Meta and Google, referred to as ext_sched_class or sched_ext. This mechanism allows users to optimize scheduler strategies for specific workloads or scenarios by implementing scheduling classes through defined BPF programs.……
Read more at: https://www.ebpf.top/post/bpf_sched_ext
1. The Emergence of Pluggable Scheduler [2004] In 2004, Con Kolivas from the Linux community proposed the idea of a pluggable scheduler, envisioning multiple schedulers in the kernel that users could choose during boot. The principle behind the patch submission involved splitting a significant amount of code into a common part in kernel/sched.c and a private part. It also included pointers in the scheduler.c file that directed functions handling scheduling tasks, which were invoked for various process events (fork(), exit(), etc.), to gather scheduling-related information. Implementing a new scheduler simply required writing replacement functions and integrating them. However, this submission faced strong opposition from community developer Ingo Molnar, who believed that having pluggable schedulers would discourage patches for scheduling domains and instead lead to separate schedulers for specific scenarios like NUMA scheduling and SMP scheduling.
Ingo Molnar’s standpoint was clear: If everyone focuses on their own little family, the scheduler as a big family will lack organization and code contributions, leading to the existence of schedulers tailored to specific scenarios.……
Read the article at: https://www.ebpf.top/post/en/bpf_rawtracepoint
1. Common Hook Types in eBPF Trace 2. BPF Rawtracepoint 2.1 Trace Performance Enhanced by 20% 2.2 Rawtracepoint Tracking Event Inspection and Quantity Statistics 2.3 Passing Parameter Changes 3. Example of Using rawtracepoint in BPF Programs 3.1 libbpf Library (Based on CO-RE) 3.2 Bpftrace Sample Code See Also 1. Common Hook Types in eBPF Trace eBPF allows tracing events in various categories within the trace domain as follows:
Kernel static trace points tracepoint/rawtracepoint/btf-tracepoint Refer to /sys/kernel/tracing/available_events Kernel dynamic trace points k[ret]probe, fentry/fexit (based on BTF) Kprobe /sys/kernel/tracing/available_filter_functions User-space static trace points USDT Viewing method: readelf -n or bpftrace tool bpftrace -l 'usdt:/home/dave/ebpf/linux-tracing/usdt/main:*' User-space dynamic trace: u[ret]probe, obtainable via nm hello | grep main Performance monitoring counters PMC perf_event This article will focus on rawtracepoint within kernel static tracing, concluding with practical code examples using the libbpf development library and bpftrace.……
Original Article: https://www.ebpf.top/post/bpfman_fedora_40
1. Background 2. Introducing bpfman 3. Standalone Deployment Process 4. Kubernetes Cluster Deployment Process 5. Summary Appendix: bpfman Single Machine Verification Development Environment Setup Code Download and Compilation of bpfman Test program management functionality 1. Background Fedora 40 proposes bpfman as the default program manager. The open-source project bpfman provides a deeper understanding of the eBPF runtime state, making it easier to manage eBPF programs (including loading, unloading, and viewing runtime status). This proposal requires approval from the Fedora Engineering Steering Committee (FESCo), but if successful, bpfman is likely to appear in Fedora 40 in April to enhance eBPF management.
So, what exactly is bpfman? This article will give you a brief introduction to bpfman and its working principles.
2. Introducing bpfman Originally named bpfd, bpfman is developed based on the Rust Aya library, using the Rust programming language.……
Article address: https://www.ebpf.top/post/network_and_bpf_2024
1. eBPF 1.1 Exponential Growth of eBPF 1.2 eBPF Application Market 1.3 Wider Application of eBPF in Mobile Devices 1.4 Risks of eBPF Abuse 2. Observability 2.1 The Most Popular Topic: Observability 2.2 Reducing Observability Overhead 2.3 Context-Aware Kubernetes Workloads 2.4 AI-Assisted Network Troubleshooting 3. Networking 3.1 Container Networking Performance Matching Host Networking Performance 3.2 Transformation in the Networking Industry 3.3 Cilium in Home Environments 3.4 Network Operators Seeking LLM Help - Not All Roses 4. Cloud Native 4.1 Kubernetes Users Pushing Back on Complexity 4.2 IPv6-Only Kubernetes Clusters Becoming More Common 4.3 Rapid Growth of WSAM 4.4 The Not-to-Be-Forgotten Heterogeneous Networks 4.5 The Challenges of Platform Engineering and Network Growth In early 2024, Nico Vibert, Senior Sales Engineer at Isovalent, made some predictions about networks and eBPF. Here, we’ll briefly outline some important conclusions, mainly related to eBPF/Cilium/cloud-native/network/observability, and security areas.……
Program Framework (Skeleton) Combine the Open and Loader Stages Selective Attachment (Attach) Custom load and attach Multiple BPF handlers for the same event Map Reduce pre-allocation overhead Determining Map Size at Runtime Per-CPU Global Variables Note on accessing fields directly through pointers Conclusion Link to this article: https://www.ebpf.top/post/top_and_tricks_for_bpf_libbpf
Original article: https://www.pingcap.com/blog/tips-and-tricks-for-writing-linux-bpf-applications-with-libbpf/
In the early days of 2020, when I was using the BCC tool to analyze our database’s performance bottlenecks and pulled code from GitHub, I unexpectedly found an extra libbpf-tools directory in the BCC project. I studied the articles on BPF Portability and BCC to libbpf Conversion and transformed the previously submitted bcc-tools into libbpf-tools based on the knowledge I had gained. Finally, I completed the conversion work for nearly 20 tools (see Why We Switched from BCC-Tools to libbpf-Tools for BPF Performance Analysis).
During this process, I was fortunate to receive a lot of help from Andrii Nakryiko (the person in charge of the libbpf + BPF CO-RE project).……
This article is available at: https://www.ebpf.top/post/lsm_bpf_intro
1. Background on Security 2. General Framework of Kernel Security Policy Module LSM 2.1 Introduction to LSM Framework 2.2 Architecture of LSM 2.3 Hook Functions in LSM 3. LSM BPF 3.1 BCC Practice 3.2 libbpf-bootstrap Framework Practice 4. Summary 5. Appendix: Process of Finding LSM Hot Patch Kernel Vulnerabilities Monitoring Hook Points 1. Background on Security Internationally, computer security is summarized by three main characteristics: Confidentiality, Integrity, and Availability (CIA).
Confidentiality means that data is not visible to unauthorized individuals. Integrity refers to information not being altered during storage or transmission. Availability implies that one’s devices can be used when needed. There are roughly four methods that computer systems employ to address security challenges: isolation, control, auditing, and obfuscation.
Access control involves controlling access, representing the subject’s actions on objects. Access control primarily involves defining subjects and objects, operations, and setting access policies.……
Article link: https://www.ebpf.top/en/post/cisco_and_isovalent
On December 21, 2023, Thomas Graf, CTO & Co-founder of Isovalent, and Tom Gillis, Senior Vice President and General Manager of Cisco’s Security Business Group, announced on their respective company websites Cisco’s plan to acquire Isovalent. The acquisition price was not disclosed by either party. Following the completion of the acquisition, the Isovalent team is set to join Cisco’s Security Business Group, with the acquisition expected to be finalized in the third quarter of the 2024 fiscal year. Cisco has a history with Isovalent, having participated in Isovalent’s $29 million Series A funding at the end of 2020. Subsequently, in 2022, Cisco, along with Microsoft, Google, and other companies, added $40 million in the Isaolvent’s Series B funding.
Cisco aims to enhance its capabilities in multi-cloud networking and security through this acquisition. The collaboration between Cisco and Isovalent will leverage the power of Cilium’s open-source technology to create uniquely advanced multi-cloud security and networking functionalities, aiding customers in simplifying and accelerating their digital transformation journey.……
Read more at: https://www.ebpf.top/post/cpu_io_wait
1. Definition of I/O Wait 2. Test and Verification 3. Further Clarification on Disk Throughput and Processes with High I/O FrequencyAfter identifying process I/O wait conditions through the vmstat b column, we can further define them using iostat and iotop. 4. Analysis of Kernel CPU Statistics Implementation 5. Conclusion References 1. Definition of I/O Wait I/O Wait is a performance metric for a single CPU, indicating the idle time consumed when threads in the CPU dispatch queue (in the sleep state) are blocked on disk I/O. The CPU’s idle time is divided into truly idle time and time spent blocked on disk I/O. A higher CPU I/O Wait time indicates a possible bottleneck in the disk, causing the CPU to wait idle. If you find this definition a bit confusing, then please continue reading. I believe that after you read the testing and verification process in this article, your understanding of the above definition will be different.……