Practical Implementation of Load Balancer with XDP
This article is available at: https://www.ebpf.top/post/xdp_lb_demo
Author: Qiu Kang
With the progress of eBPF, we can now deploy eBPF/XDP programs directly on regular servers to achieve load balancing, saving the need for dedicated machines for LVS deployment.
The previous article shared how to use xdp/ebpf to replace LVS for SLB. It adopted the independent machine deployment mode for SLB and loaded xdp program using bpftool and hardcoded configuration, which was version 0.1.
Version 0.2 modified the 0.1 version to a programmatic loading mode based on BPF skeleton. To experience this workflow easily without changing the overall deployment mode of version 0.1, you can check out https://github.com/MageekChiu/xdp4slb/tree/dev-0.2
Version 0.3 added support for dynamically loading SLB configurations in the form of configuration files and command-line parameters.
This article belongs to version 0.4, which supports a mixed deployment mode of SLB and application, eliminating the need for dedicated SLB machines. The mixed deployment mode allows regular machines to perform load balancing directly, without impacting applications (can demonstrate in offload mode), providing cost-effectiveness. Additionally, in scenarios routing locally, the number of routing hops is reduced, resulting in better overall performance.
Creating network environment
|
|
Analysis of Principles
SLB Cluster Routing
For high availability, SLB is usually deployed in clusters. How are requests routed to each SLB instance? Generally, (dynamic) routing protocols (OSPF BGP) are used to achieve ECMP, allowing each SLB instance to evenly receive traffic from routers/switches. Since configuring dynamic routing protocols is complex and beyond the scope of this article, a simple script is used here to simulate ECMP.
|
|
This script repeatedly modifies the next hop to reach the VIP among several hosts (mix, which includes SLB and app), switching back and forth.
NAT Mode
Versions 0.1~0.3 all used full NAT mode, which is no longer suitable in the current mixed mode as it may cause packet looping. Without labeling packets, XDP programs cannot distinguish packets from clients or from another SLB. We adopt the DR mode, which not only avoids looping issues but also provides better performance because:
- The return packet has one less hop
- Less packet modification is required, as well as no need to recalculate IP, TCP checksums, etc.
The architecture diagram is as follows, simplified for illustrative purposes. There is actually a router/switch between the client and mix, but we used the above simulation script to directly incorporate routing functionality into the client.
Dark blue represents requests, light blue represents responses. VIP uses eCMP, routing a request only to one mix; SLB on the mix may forward this to a local app (Nginx in this article) or another mix, but the response always goes directly back from the mix and won’t go through other mixes again.
Load Balancing Algorithms
Currently, the following algorithms are supported:
- random
- round_roubin
- hash
This article does not synchronize session states in the SLB cluster, so only the hash algorithm can be selected. This means that regardless of which SLB the request is routed to, it will be forwarded to the same backend app.## SLB Routing Pseudocode
if (dest_ip = local_ip){
// Directly pass to local protocol stack
return
}
if (dest_ip = vip && dest_port = vport){
Select an RS using the load balancing algorithm
If RS is the local machine, pass it directly to the local protocol stack and return
Otherwise, set the MAC of rs as the destination of the new packet
Also, save the bidirectional mapping between the client and rs
Facilitates subsequent routing directly
Set the MAC of the local machine as the source of the new packet
Route the new packet out
}else{
Error, drop the packet
}
Configuring SLB and Applications
The Dockerfile for Mix is as follows
FROM debian:bookworm
RUN apt-get update -y && apt-get upgrade -y \
&& apt install -y nginx procps bpftool iproute2 net-tools telnet kmod curl tcpdump
WORKDIR /tmp/
COPY src/slb /tmp/
COPY slb.conf /tmp/
The image modification is because the libc version of my Fedora:37 host is 2.36, while the corresponding version for Debian:Bullseye is 2.31, which cannot directly run the executable compiled on the host.
Build the image and run the app (here is nginx)
docker build -t mageek/mix:0.1 .
# In case you want to run a brand new container
docker rm mix1 mix2 -f
docker run -itd --name mix1 --hostname mix1 --privileged=true \
--net south -p 8888:80 --ip 172.19.0.2 --mac-address="02:42:ac:13:00:02" \
-v "$(pwd)"/rs1.html:/var/www/html/index.html:ro mageek/mix:0.1 nginx -g "daemon off;"
docker run -itd --name mix2 --hostname mix2 --privileged=true \
--net south -p 9999:80 --ip 172.19.0.3 --mac-address="02:42:ac:13:00:03" \
-v "$(pwd)"/rs2.html:/var/www/html/index.html:ro mageek/mix:0.1 nginx -g "daemon off;"
# Check on the host
docker ps
curl 127.0.0.1:8888
curl 127.0.0.1:9999
Access each container, configure the VIP, after configuring the VIP in Mix, disable arp to avoid affecting packet routing for the client
docker exec -it mix1 bash
docker exec -it mix2 bash
ifconfig lo:0 172.19.0.10/32 up
echo "1">/proc/sys/net/ipv4/conf/all/arp_ignore
echo "1">/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2">/proc/sys/net/ipv4/conf/all/arp_announce
echo "2">/proc/sys/net/ipv4/conf/lo/arp_announce
Then run SLB
# Start SLB and specify the network card and configuration file
./slb -i eth0 -c ./slb.conf
# In another terminal
bpftool prog list
# bpftool prog show name xdp_lb --pretty
# Check global variables
# bpftool map list
# bpftool map dump name slb_bpf.rodata
# Check attaching with
ip link
View the log directly on the host machine (one copy for the entire machine), do not open multiple terminals (which may cause incomplete logs)
bpftool prog tracelog
During the testing phase, after compiling the executable on the host machine, copy it to the container (assuming you have created these containers and related networks)
docker start mix1 mix2 client
docker cp src/slb mix1:/tmp/ && \
docker cp slb.conf mix1:/tmp/ && \
docker cp src/slb mix2:/tmp/ && \
docker cp slb.conf mix2:/tmp/ && \
docker cp routing.sh client:/tmp/
Testing
Start a new client container
docker run -itd --name client --hostname client --privileged=true \
--net south -p 10000:80 --ip 172.19.0.9 --mac-address="02:42:ac:13:00:09" \
-v "$(pwd)"/routing.sh:/tmp/routing.sh mageek/mix:0.1 nginx -g "daemon off;"
Enter the client and configure and run the following routing script
|
|
Open another client terminal for request testing
docker exec -it client bash
# Visit rs first
curl 172.19.0.2:80
curl 172.19.0.3:80
# Visit slb
curl 172.19.0.10:80
rs-1
curl 172.19.0.10:80
rs-2
We can do some load testing inside the client, but remember not to run routing.sh during load testing, as there exists an issue of “intermediate state where the old route is just deleted while the new route is not yet established” causing request failures in concurrent scenarios.
apt-get install apache2-utils
# Concurrent 50, total requests 5000
ab -c 50 -n 5000 http://172.19.0.10:80/
The load testing results are as follows, showing that all requests were successful.
Server Software: nginx/1.22.1
Server Hostname: 172.19.0.10
Server Port: 80
Document Path: /
Document Length: 5 bytes
Concurrency Level: 50
Time taken for tests: 3.141 seconds
Complete requests: 5000
Failed requests: 0
Total transferred: 1170000 bytes
HTML transferred: 25000 bytes
Requests per second: 1591.81 [#/sec] (mean)
Time per request: 31.411 [ms] (mean)
Time per request: 0.628 [ms] (mean, across all concurrent requests)
Transfer rate: 363.75 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 15 3.9 15 31
Processing: 5 16 4.7 16 48
Waiting: 0 11 4.4 10 34
Total: 17 31 3.6 30 60
Percentage of the requests served within a certain time (ms)
50% 30
66% 32
75% 32
80% 33
90% 35
95% 37
98% 40
99% 47
100% 60 (longest request)
You can increase the number of concurrent tests. The maximum theoretical value of concurrency is the maximum number of back_map entries where we store conntrack entries. Exceeding this concurrency level may cause remapping (not under hash mode), possibly leading to TCP resets.
Preview
To build a complete SLB, there is still a lot of work to be done, such as utilizing kernel capabilities for MAC auto-addressing, numerous boundary checks, etc. These are tasks to be carried out later, and everyone is welcome to participate at https://github.com/MageekChiu/xdp4slb/.
- Author: Qiu Kang
- Link: https://www.ebpf.top/en/post/xdp_lb_demo/
- License: This work is under a Attribution-NonCommercial-NoDerivs 4.0 International. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.
- Last Modified Time: 2024-02-07 00:16:45.291510862 +0800 CST