SlideShare a Scribd company logo
Linux Networking Explained
LinuxCon 2016, Toronto
Thomas Graf (@tgraf__)
Kernel, Cilium & Open vSwitch Team
Noiro Networks (Cisco)
Did you catch part I?
● Part II: LinuxCon, Toronto, 2016
Linux Networking Explained
Network devices, Namespaces, Rou"ng, Veth, VLAN, IPVLAN, MACVLAN,
MACVTAP, Bonding, Team, OVS, Bridge, BPF, IPSec
● Part I: LinuxCon, Sea,le, 2015
Kernel Networking Walkthrough
The protocol stack, sockets, o/oads, TCP fast open, TCP small queues,
NAPI, busy polling, RSS, RPS, memory accoun"ng
h/p://goo.gl/ZKJpor
Linux Networking Explained
Network Devices
● Real / Physical
Backed by hardware
Example: Ethernet card,
WIFI, USB, ...
● So7ware / Virtual
Simula"on or virtual
representa"on
Example: Loopback (lo),
Bridge (br), Virtual Ethernet
(veth), ...
$ ip link
[...]
$ ip link show enp1s0f1
4: enp1s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state [...]
link/ether 90:e2:ba:61:e7:45 brd ff:ff:ff:ff:ff:ff
Addresses
$ ip addr add 192.168.23.5/24 dev em1
$ ip address show dev em1
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP [...]
link/ether 10:c3:7b:95:21:da brd ff:ff:ff:ff:ff:ff
inet 192.168.23.5/24 brd 192.168.23.255 scope global em1
valid_lft forever preferred_lft forever
inet6 fe80::12c3:7bff:fe95:21da/64 scope link
valid_lft forever preferred_lft forever
Do we need to consider a packet for local sockets?
ip_forward()
Local?
Rou:ng
Sockets
ip_local_deliver() ip_output()
net.ipv4.conf.all.forwarding = 1
Pro Tip: The Local Table
$ ip route list table local type local
127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
192.168.23.5 dev em1 proto kernel scope host src 192.168.23.5
192.168.122.1 dev virbr0 proto kernel scope host src 192.168.122.1
List all accepted local addresses:
H4x0r Tip: You can also modify this table a5er the generated
local routes have been inserted.
Rou:ng
$ ip route add 10.0.0.0/8 dev em1
$ ip route show
10.0.0.0/8 dev em1 scope link
Device
$ ip route add 20.10.0.0/16 via 10.0.0.1
$ ip route show
20.10.0.0/16 via 10.0.0.1 dev em1
Direct Route - endpoints are direct neighbours (L2)
Nexthop Route - endpoints are behind another router (L3)
DeviceSockets
Device
Pro Trick: Simula:ng a Route Lookup
$ ip route get 20.10.3.3
20.10.3.3 via 10.0.0.1 dev em1 src 192.168.23.5
cache
How will a packet to 20.10.3.3 get routed?
NOTE: This is not just $(ip route show | grep). It performs an
actual route lookup on the speci8ed des"na"on address in the
kernel.
Network Namespaces
$ ip netns add blue
$ ip link set tap0 netns blue
$ ip netns exec blue ip address
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
19: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 42:ad:d0:10:e0:67 brd ff:ff:ff:ff:ff:ff
Namespace 2Namespace 1
NOTE: Not all data structures are namespace aware yet!
eth0tap0
AddressesAddresses
RoutesRoutes
SocketsSockets
Linux maintains resources and data structures per namespace
Virtual Network 2
Virtual Network 3
Virtual Network 1
VLAN
Virtual Networks on Layer 2
$ ip link add link em1 vlan1 type vlan id 1
$ ip link set vlan1 up
$ ip link show vlan1
15: vlan1@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP [...]
link/ether 10:c3:7b:95:21:da brd ff:ff:ff:ff:ff:ff
VLAN1 VLAN1
VLAN2 VLAN2
VLAN3 VLAN3
L2
Ethernet VLAN IP
Packet Headers:
Bonding / Team
Link Aggrega:on
$ cp /usr/share/doc/teamd-*/example_configs/activebackup_ethtool_1.conf .
$ teamd -g -f activebackup_ethtool_1.conf -d
[...]
$ teamdctl team0 state
[...]
team0
● Uses:
– Redundant network cards
(failover)
– Connect to mul"ple ToR (LB)
● Implementa:ons:
– Team (new, user/kernel)
– Bonding (old, kernel only)
Namespace 1 Namespace 2
Veth
Virtual Ethernet Cable
● Bidirec"onal FIFO
● O5en used to cross namespaces
$ ip link add veth1 type veth peer name veth2
$ ip link set veth1 netns ns1
$ ip link set veth2 netns ns2
veth0 veth1
Bridge
Virtual Switch
● Flooding: Clone packets and send
to all ports.
● Learning: Learn who's behind
which port to avoid <ooding
● STP: Detect wiring loops and
disable ports
● Na:ve VLAN integra:on
● OFoad: Program HW based on FDB
table
$ ip link add br0 type bridge
$ ip link set eth0 master br0
$ ip link set tap3 master br0
$ ip link set br0 up
br0
port portport
Example
Bridge + Team + Veth
Namespace
Host
Namespace
Container B
Namespace
Container A
br0
veth1
team0
veth0
eth0eth0
eth1eth0
MACVLAN
SimpliIed bridging for guests
● NOT 802.1Q VLANs
● Mul"ple MAC addresses on single interface
● KISS - no learning, no STP
● Modes:
– VEPA (default): Guest to guest done on
ToR, L3 fallback possible
– Bridge: Guest to guest in so5ware
– Private: Isolated, no guest to guest
– Passthrough: A,aches VF (SR-IOV)
$ ip link add link em1 name macvlan0 type macvlan mode bridge
$ ip -d link show macvlan0
23: macvlan0@em1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN [...]
link/ether f2:d8:91:54:d0:69 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvlan mode bridge addrgenmode eui64
$ ip link set macvlan0 netns blue
macvlan0
MAC1
Physical Device
macvlan1
MAC2
master
slaves
Example
Team + MACVLAN
Namespace
Host
Namespace
Container B
Namespace
Container A
team0
eth0
(macvlan)
eth0
(macvlan)
eth1eth0
IPVLAN
MACVLAN for Layer 3 (L3)
● Can hide many containers behind a
single MAC address.
● Shared L2 among slaves
● Mode:
– L2: Like MACVLAN w/ single MAC
– L3: L2 deferred to master
namespace, no mul"cast/broadcast
$ ip netns add blue
$ ip link add link eth0 ipvl0 type ipvlan mode l3
$ ip link set dev ipvl0 netns blue
$ ip netns exec blue ip link set dev ipvl0 up
$ ip netns exec blue ip addr add 10.1.1.1/24 dev ipvl0
ipvlan0
IP1
Physical Device
ipvlan1
IP2
master
slaves
MACVLAN vs IPVLAN
MACVLAN
– ToR or NIC may have
maximum MAC address
limit
– Doesn't work well with
802.11 (wireless)
IPVLAN
– DHCP based on MAC
doesn't work, must use
client ID
– EUI-64 IPv6 addresses
genera"on issues
– No broadcast/mul"cast
in L3 mode
TUN/TAP
A gate to user space
● Character Device in user space
● Network device in kernel space
● L2 (TAP) or L3 (TUN)
● Uses: encryp"on, VPN, tunneling,
virtual machines, ...
tun0 tap0
fd = open("/dev/net/tun", O_RDWR);
strncpy(ifr.ifr_name,“tap0”, IFNAMSIZ);
ioctl(fd, TUNSETIFF, (void *) &ifr);
$ ip tuntap add tun0 mode tun
$ ip link set tun0 up
$ ip link show tun0
18: tun0: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel [...]
link/none
$ ip route add 10.1.1.0/24 dev tun0
user.c:
File
Descriptor
File
Descriptor
kernel
user
MACVTAP
Bridge + TAP = MACVTAP
● A TAP with an integrated bridge
● Connects VM/container via L2
● Same modes as MACVLAN macvtap2
MAC1
macvtap3
MAC2
$ ip link add link em1 name macvtap0 type macvtap mode vepa
$ ip -d link show macvtap
20: macvtap0@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP [...]
link/ether 3e:cb:79:61:8c:4b brd ff:ff:ff:ff:ff:ff
macvtap mode vepa addrgenmode eui64
$ ls -l /dev/tap20
crw-------. 1 root root 241, 1 Aug 8 21:08 /dev/tap20
/dev/tap3/dev/tap2
kernel
user
Physical Device
Virtual Network 2
Virtual Network 3
Virtual Network 1
Encapsula:on (Tunnels)
Virtual Networks on Layer 3/4
$ ip link add vxlan42 type vxlan id 42 group 239.1.1.1 dev em1 dstport 4789
$ ip link set vxlan42 up
$ ip link show vxlan42
31: vxlan42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN [...]
link/ether e6:fc:c8:7e:07:83 brd ff:ff:ff:ff:ff:ff
vxlan1 vxlan1
vxlan2 vxlan2
vxlan3 vxlan3
L3/L4
Ethernet VXLANIP
VXLAN Headers example:
UDP Ethernet IP TCP
Underlay Overlay
Authen:cated &
Encrypted
IPSec
$ ip xfrm state add src 192.168.211.138 dst 192.168.211.203 proto esp 
spi 0x53fa0fdd mode transport reqid 16386 replay-window 32 
auth "hmac(sha1)" 0x55f01ac07e15e437115dde0aedd18a822ba9f81e 
enc "cbc(aes)" 0x6aed4975adf006d65c76f63923a6265b 
sel src 0.0.0.0/0 dst 0.0.0.0/0
Socket Socket
L3
Ethernet ESPIP
Transport Mode
TCP
Tunnel Mode
Ethernet ESPIP TCPIP
Netdevice Netdevice
● AH: Authen"ca"on
● ESP: Authenica"on +
encryp"on
ovs0
port portport
● Fully programmable L2-L4 virtual
switch with APIs: OpenFlow and
OVSDB
● Split into a user and kernel component
● Mul"ple control plane integra"ons:
– OVN, ODL, Neutron, CNI, Docker, ...
...
$ ovs-vsctl add-br ovs0
$ ovs-vsctl add-port ovs0 em1
$ ovs-ofctl add-flow ovs0 in_port=1,actions=drop
$ ovs-vsctl show
a425a102-c317-4743-b0ba-79d59ff04a74
Bridge "ovs0"
Port "em1"
Interface "em1"
[...]
Kernel
Userspace
BPF
Source
Code
Byte
Code
LLVM/clang
Sockets
netdevice
Network
StackTC
Ingress
TC
Egress
netdevice
$ clang -O2 -target bpf -c code.c -o code.o
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 ingress bpf da obj code.o sec my-section1
$ tc filter add dev eth0 egress bpf da obj code.o sec my-section2
A/aching a BPF program to eth0 at ingress:
VeriIer
+ JIT
add eax,edx
shl eax,2
add eax,edx
shl eax,2
BPF Features
(As of Aug 2016)
● Maps
– Arrays (per CPU), hashtables (per CPU)
● Packet mangling
● Redirect to other device
● Tunnel metadata (encapsula"on)
● Cgroups integra"on
● Event no"8ca"ons via perf ring buJer
Kernel
Userspace
XDP – Express Data Path
Source
Code
Byte
Code
LLVM/clang
Sockets
Netdevice
Network
Stack
VeriIer
+ JIT
add eax,edx
shl eax,2
Driver
Access to
DMA buLer
Q&A
Image Sources:
● Cover (Toronto)
Rick Harris (h,ps://www.<ickr.com/photos/rickharris/)
● The Invisible Man
Dr. Azzacov (h,ps://www.<ickr.com/photos/drazzacov/)
● Chicken
JOHN LLOYD (h,ps://www.<ickr.com/photos/hugo90/)
Learn more about networking with BPF:
Fast IPv6-only Networking for Containers Based on
BPF and XDP
Wednesday August 24, 2016 4:35pm – 5:35pm, Queen's Quay
Contact:
● Twi/er: @tgraf__ Mail: tgraf@tgraf.ch

More Related Content

PDF
The linux networking architecture
PDF
Fun with Network Interfaces
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
PPTX
Linux Network Stack
PPTX
The TCP/IP Stack in the Linux Kernel
PDF
Linux Linux Traffic Control
PDF
netfilter and iptables
PDF
Faster packet processing in Linux: XDP
The linux networking architecture
Fun with Network Interfaces
LinuxCon 2015 Linux Kernel Networking Walkthrough
Linux Network Stack
The TCP/IP Stack in the Linux Kernel
Linux Linux Traffic Control
netfilter and iptables
Faster packet processing in Linux: XDP

What's hot (20)

ODP
Dpdk performance
PDF
Using eBPF for High-Performance Networking in Cilium
PPTX
Tutorial: Using GoBGP as an IXP connecting router
PPTX
Introduction to DPDK
PDF
DevConf 2014 Kernel Networking Walkthrough
PDF
Intel dpdk Tutorial
PDF
BPF: Tracing and more
PDF
Meet cute-between-ebpf-and-tracing
ODP
eBPF maps 101
PDF
Building Network Functions with eBPF & BCC
PDF
DoS and DDoS mitigations with eBPF, XDP and DPDK
PDF
BPF Internals (eBPF)
PPTX
Understanding DPDK
PPTX
DPDK KNI interface
PDF
DPDK In Depth
PDF
BGP Unnumbered で遊んでみた
PDF
EBPF and Linux Networking
PDF
Network Programming: Data Plane Development Kit (DPDK)
PDF
VXLAN and FRRouting
PDF
eBPF - Rethinking the Linux Kernel
Dpdk performance
Using eBPF for High-Performance Networking in Cilium
Tutorial: Using GoBGP as an IXP connecting router
Introduction to DPDK
DevConf 2014 Kernel Networking Walkthrough
Intel dpdk Tutorial
BPF: Tracing and more
Meet cute-between-ebpf-and-tracing
eBPF maps 101
Building Network Functions with eBPF & BCC
DoS and DDoS mitigations with eBPF, XDP and DPDK
BPF Internals (eBPF)
Understanding DPDK
DPDK KNI interface
DPDK In Depth
BGP Unnumbered で遊んでみた
EBPF and Linux Networking
Network Programming: Data Plane Development Kit (DPDK)
VXLAN and FRRouting
eBPF - Rethinking the Linux Kernel
Ad

Viewers also liked (20)

PDF
BPF: Next Generation of Programmable Datapath
PDF
Cilium - Container Networking with BPF & XDP
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
PDF
Servicios de Red e Internet
PPS
Transferencia de archivos FTP
DOCX
Ubuntu. configurar tarjeta de red mediante lineas de comando
PDF
Linux Network commands
PDF
Instalación y configuración Servidor FTP y SSH
PDF
Manual de configuracion de redes
PDF
Servidor ftp linux final
PDF
Manejo de-redes-linux
PPTX
Servidores de red
PDF
Act. n°5 (dhcp, dns, http, ftp & ssh)
PDF
Tutorial de Instalación de Sistema y Servicios de Red en Lenny Debian 5
ODP
Instalación ftp, telnet y ssh sobre linux
PDF
Instalación y configuración de servidor ftp en ubuntu server 14
PDF
Servicio FTP en Ubuntu
DOC
Configuracion servidor web, dns, ftp, pop3 y smtp txt para linux
PDF
Configuracion del servidor vsftpd en linux
DOCX
Instalación y configuración interfaz gráfica ubuntu server 12.04
BPF: Next Generation of Programmable Datapath
Cilium - Container Networking with BPF & XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
Servicios de Red e Internet
Transferencia de archivos FTP
Ubuntu. configurar tarjeta de red mediante lineas de comando
Linux Network commands
Instalación y configuración Servidor FTP y SSH
Manual de configuracion de redes
Servidor ftp linux final
Manejo de-redes-linux
Servidores de red
Act. n°5 (dhcp, dns, http, ftp & ssh)
Tutorial de Instalación de Sistema y Servicios de Red en Lenny Debian 5
Instalación ftp, telnet y ssh sobre linux
Instalación y configuración de servidor ftp en ubuntu server 14
Servicio FTP en Ubuntu
Configuracion servidor web, dns, ftp, pop3 y smtp txt para linux
Configuracion del servidor vsftpd en linux
Instalación y configuración interfaz gráfica ubuntu server 12.04
Ad

Similar to Linux Networking Explained (20)

PDF
Open stack advanced_part
PDF
Contemporary Linux Networking
PDF
LinuxConJapan2014_makita_0_MACVLAN.pdf
PDF
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
PPTX
Virtual Networking (1) (1).pptx
PDF
Secure LXC Networking
ODP
Securing the network for VMs or Containers
PPT
Vlan
PPTX
VyOS Users Meeting #2, VyOSのVXLANの話
PDF
An Overview of Linux Networking Options
DOCX
Ccna 4 final lab switchi
PDF
VXLAN BGP EVPN: Technology Building Blocks
PDF
OpenNebulaConf 2016 - Networking, NFVs and SDNs Hands-on Workshop by Rubén S....
PPTX
Training open stack networking -neutron
PDF
Vlan.pdf
PPT
VLAN Virtual Area Network ,Switch,Ethernet ,VIkram Snehi
PDF
packet traveling (pre cloud)
PDF
PLNOG15: Is there something less complicated than connecting two LAN networks...
PDF
Xpress path vxlan_bgp_evpn_appricot2019-v2_
PDF
Network virtualization seminar report
Open stack advanced_part
Contemporary Linux Networking
LinuxConJapan2014_makita_0_MACVLAN.pdf
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Virtual Networking (1) (1).pptx
Secure LXC Networking
Securing the network for VMs or Containers
Vlan
VyOS Users Meeting #2, VyOSのVXLANの話
An Overview of Linux Networking Options
Ccna 4 final lab switchi
VXLAN BGP EVPN: Technology Building Blocks
OpenNebulaConf 2016 - Networking, NFVs and SDNs Hands-on Workshop by Rubén S....
Training open stack networking -neutron
Vlan.pdf
VLAN Virtual Area Network ,Switch,Ethernet ,VIkram Snehi
packet traveling (pre cloud)
PLNOG15: Is there something less complicated than connecting two LAN networks...
Xpress path vxlan_bgp_evpn_appricot2019-v2_
Network virtualization seminar report

More from Thomas Graf (14)

PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
PDF
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
PDF
Accelerating Envoy and Istio with Cilium and the Linux Kernel
PDF
Cilium - API-aware Networking and Security for Containers based on BPF
PDF
Cilium - Network security for microservices
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
PDF
Linux Native, HTTP Aware Network Security
PDF
Cilium - BPF & XDP for containers
PDF
LinuxCon 2015 Stateful NAT with OVS
PDF
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
PDF
2015 FOSDEM - OVS Stateful Services
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
PDF
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
PDF
SDN & NFV Introduction - Open Source Data Center Networking
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - Network security for microservices
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Linux Native, HTTP Aware Network Security
Cilium - BPF & XDP for containers
LinuxCon 2015 Stateful NAT with OVS
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
2015 FOSDEM - OVS Stateful Services
Open vSwitch - Stateful Connection Tracking & Stateful NAT
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
SDN & NFV Introduction - Open Source Data Center Networking

Recently uploaded (20)

PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Introduction to Artificial Intelligence
PPTX
Essential Infomation Tech presentation.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
System and Network Administration Chapter 2
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Nekopoi APK 2025 free lastest update
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction to Artificial Intelligence
Essential Infomation Tech presentation.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Upgrade and Innovation Strategies for SAP ERP Customers
Understanding Forklifts - TECH EHS Solution
Odoo POS Development Services by CandidRoot Solutions
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How to Migrate SBCGlobal Email to Yahoo Easily
Reimagine Home Health with the Power of Agentic AI​
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
System and Network Administration Chapter 2
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Nekopoi APK 2025 free lastest update
2025 Textile ERP Trends: SAP, Odoo & Oracle

Linux Networking Explained

  • 1. Linux Networking Explained LinuxCon 2016, Toronto Thomas Graf (@tgraf__) Kernel, Cilium & Open vSwitch Team Noiro Networks (Cisco)
  • 2. Did you catch part I? ● Part II: LinuxCon, Toronto, 2016 Linux Networking Explained Network devices, Namespaces, Rou"ng, Veth, VLAN, IPVLAN, MACVLAN, MACVTAP, Bonding, Team, OVS, Bridge, BPF, IPSec ● Part I: LinuxCon, Sea,le, 2015 Kernel Networking Walkthrough The protocol stack, sockets, o/oads, TCP fast open, TCP small queues, NAPI, busy polling, RSS, RPS, memory accoun"ng h/p://goo.gl/ZKJpor
  • 4. Network Devices ● Real / Physical Backed by hardware Example: Ethernet card, WIFI, USB, ... ● So7ware / Virtual Simula"on or virtual representa"on Example: Loopback (lo), Bridge (br), Virtual Ethernet (veth), ... $ ip link [...] $ ip link show enp1s0f1 4: enp1s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state [...] link/ether 90:e2:ba:61:e7:45 brd ff:ff:ff:ff:ff:ff
  • 5. Addresses $ ip addr add 192.168.23.5/24 dev em1 $ ip address show dev em1 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP [...] link/ether 10:c3:7b:95:21:da brd ff:ff:ff:ff:ff:ff inet 192.168.23.5/24 brd 192.168.23.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::12c3:7bff:fe95:21da/64 scope link valid_lft forever preferred_lft forever Do we need to consider a packet for local sockets? ip_forward() Local? Rou:ng Sockets ip_local_deliver() ip_output() net.ipv4.conf.all.forwarding = 1
  • 6. Pro Tip: The Local Table $ ip route list table local type local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 192.168.23.5 dev em1 proto kernel scope host src 192.168.23.5 192.168.122.1 dev virbr0 proto kernel scope host src 192.168.122.1 List all accepted local addresses: H4x0r Tip: You can also modify this table a5er the generated local routes have been inserted.
  • 7. Rou:ng $ ip route add 10.0.0.0/8 dev em1 $ ip route show 10.0.0.0/8 dev em1 scope link Device $ ip route add 20.10.0.0/16 via 10.0.0.1 $ ip route show 20.10.0.0/16 via 10.0.0.1 dev em1 Direct Route - endpoints are direct neighbours (L2) Nexthop Route - endpoints are behind another router (L3) DeviceSockets Device
  • 8. Pro Trick: Simula:ng a Route Lookup $ ip route get 20.10.3.3 20.10.3.3 via 10.0.0.1 dev em1 src 192.168.23.5 cache How will a packet to 20.10.3.3 get routed? NOTE: This is not just $(ip route show | grep). It performs an actual route lookup on the speci8ed des"na"on address in the kernel.
  • 9. Network Namespaces $ ip netns add blue $ ip link set tap0 netns blue $ ip netns exec blue ip address 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 19: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 42:ad:d0:10:e0:67 brd ff:ff:ff:ff:ff:ff Namespace 2Namespace 1 NOTE: Not all data structures are namespace aware yet! eth0tap0 AddressesAddresses RoutesRoutes SocketsSockets Linux maintains resources and data structures per namespace
  • 10. Virtual Network 2 Virtual Network 3 Virtual Network 1 VLAN Virtual Networks on Layer 2 $ ip link add link em1 vlan1 type vlan id 1 $ ip link set vlan1 up $ ip link show vlan1 15: vlan1@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP [...] link/ether 10:c3:7b:95:21:da brd ff:ff:ff:ff:ff:ff VLAN1 VLAN1 VLAN2 VLAN2 VLAN3 VLAN3 L2 Ethernet VLAN IP Packet Headers:
  • 11. Bonding / Team Link Aggrega:on $ cp /usr/share/doc/teamd-*/example_configs/activebackup_ethtool_1.conf . $ teamd -g -f activebackup_ethtool_1.conf -d [...] $ teamdctl team0 state [...] team0 ● Uses: – Redundant network cards (failover) – Connect to mul"ple ToR (LB) ● Implementa:ons: – Team (new, user/kernel) – Bonding (old, kernel only)
  • 12. Namespace 1 Namespace 2 Veth Virtual Ethernet Cable ● Bidirec"onal FIFO ● O5en used to cross namespaces $ ip link add veth1 type veth peer name veth2 $ ip link set veth1 netns ns1 $ ip link set veth2 netns ns2 veth0 veth1
  • 13. Bridge Virtual Switch ● Flooding: Clone packets and send to all ports. ● Learning: Learn who's behind which port to avoid <ooding ● STP: Detect wiring loops and disable ports ● Na:ve VLAN integra:on ● OFoad: Program HW based on FDB table $ ip link add br0 type bridge $ ip link set eth0 master br0 $ ip link set tap3 master br0 $ ip link set br0 up br0 port portport
  • 14. Example Bridge + Team + Veth Namespace Host Namespace Container B Namespace Container A br0 veth1 team0 veth0 eth0eth0 eth1eth0
  • 15. MACVLAN SimpliIed bridging for guests ● NOT 802.1Q VLANs ● Mul"ple MAC addresses on single interface ● KISS - no learning, no STP ● Modes: – VEPA (default): Guest to guest done on ToR, L3 fallback possible – Bridge: Guest to guest in so5ware – Private: Isolated, no guest to guest – Passthrough: A,aches VF (SR-IOV) $ ip link add link em1 name macvlan0 type macvlan mode bridge $ ip -d link show macvlan0 23: macvlan0@em1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN [...] link/ether f2:d8:91:54:d0:69 brd ff:ff:ff:ff:ff:ff promiscuity 0 macvlan mode bridge addrgenmode eui64 $ ip link set macvlan0 netns blue macvlan0 MAC1 Physical Device macvlan1 MAC2 master slaves
  • 16. Example Team + MACVLAN Namespace Host Namespace Container B Namespace Container A team0 eth0 (macvlan) eth0 (macvlan) eth1eth0
  • 17. IPVLAN MACVLAN for Layer 3 (L3) ● Can hide many containers behind a single MAC address. ● Shared L2 among slaves ● Mode: – L2: Like MACVLAN w/ single MAC – L3: L2 deferred to master namespace, no mul"cast/broadcast $ ip netns add blue $ ip link add link eth0 ipvl0 type ipvlan mode l3 $ ip link set dev ipvl0 netns blue $ ip netns exec blue ip link set dev ipvl0 up $ ip netns exec blue ip addr add 10.1.1.1/24 dev ipvl0 ipvlan0 IP1 Physical Device ipvlan1 IP2 master slaves
  • 18. MACVLAN vs IPVLAN MACVLAN – ToR or NIC may have maximum MAC address limit – Doesn't work well with 802.11 (wireless) IPVLAN – DHCP based on MAC doesn't work, must use client ID – EUI-64 IPv6 addresses genera"on issues – No broadcast/mul"cast in L3 mode
  • 19. TUN/TAP A gate to user space ● Character Device in user space ● Network device in kernel space ● L2 (TAP) or L3 (TUN) ● Uses: encryp"on, VPN, tunneling, virtual machines, ... tun0 tap0 fd = open("/dev/net/tun", O_RDWR); strncpy(ifr.ifr_name,“tap0”, IFNAMSIZ); ioctl(fd, TUNSETIFF, (void *) &ifr); $ ip tuntap add tun0 mode tun $ ip link set tun0 up $ ip link show tun0 18: tun0: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel [...] link/none $ ip route add 10.1.1.0/24 dev tun0 user.c: File Descriptor File Descriptor kernel user
  • 20. MACVTAP Bridge + TAP = MACVTAP ● A TAP with an integrated bridge ● Connects VM/container via L2 ● Same modes as MACVLAN macvtap2 MAC1 macvtap3 MAC2 $ ip link add link em1 name macvtap0 type macvtap mode vepa $ ip -d link show macvtap 20: macvtap0@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP [...] link/ether 3e:cb:79:61:8c:4b brd ff:ff:ff:ff:ff:ff macvtap mode vepa addrgenmode eui64 $ ls -l /dev/tap20 crw-------. 1 root root 241, 1 Aug 8 21:08 /dev/tap20 /dev/tap3/dev/tap2 kernel user Physical Device
  • 21. Virtual Network 2 Virtual Network 3 Virtual Network 1 Encapsula:on (Tunnels) Virtual Networks on Layer 3/4 $ ip link add vxlan42 type vxlan id 42 group 239.1.1.1 dev em1 dstport 4789 $ ip link set vxlan42 up $ ip link show vxlan42 31: vxlan42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN [...] link/ether e6:fc:c8:7e:07:83 brd ff:ff:ff:ff:ff:ff vxlan1 vxlan1 vxlan2 vxlan2 vxlan3 vxlan3 L3/L4 Ethernet VXLANIP VXLAN Headers example: UDP Ethernet IP TCP Underlay Overlay
  • 22. Authen:cated & Encrypted IPSec $ ip xfrm state add src 192.168.211.138 dst 192.168.211.203 proto esp spi 0x53fa0fdd mode transport reqid 16386 replay-window 32 auth "hmac(sha1)" 0x55f01ac07e15e437115dde0aedd18a822ba9f81e enc "cbc(aes)" 0x6aed4975adf006d65c76f63923a6265b sel src 0.0.0.0/0 dst 0.0.0.0/0 Socket Socket L3 Ethernet ESPIP Transport Mode TCP Tunnel Mode Ethernet ESPIP TCPIP Netdevice Netdevice ● AH: Authen"ca"on ● ESP: Authenica"on + encryp"on
  • 23. ovs0 port portport ● Fully programmable L2-L4 virtual switch with APIs: OpenFlow and OVSDB ● Split into a user and kernel component ● Mul"ple control plane integra"ons: – OVN, ODL, Neutron, CNI, Docker, ... ... $ ovs-vsctl add-br ovs0 $ ovs-vsctl add-port ovs0 em1 $ ovs-ofctl add-flow ovs0 in_port=1,actions=drop $ ovs-vsctl show a425a102-c317-4743-b0ba-79d59ff04a74 Bridge "ovs0" Port "em1" Interface "em1" [...]
  • 24. Kernel Userspace BPF Source Code Byte Code LLVM/clang Sockets netdevice Network StackTC Ingress TC Egress netdevice $ clang -O2 -target bpf -c code.c -o code.o $ tc qdisc add dev eth0 clsact $ tc filter add dev eth0 ingress bpf da obj code.o sec my-section1 $ tc filter add dev eth0 egress bpf da obj code.o sec my-section2 A/aching a BPF program to eth0 at ingress: VeriIer + JIT add eax,edx shl eax,2 add eax,edx shl eax,2
  • 25. BPF Features (As of Aug 2016) ● Maps – Arrays (per CPU), hashtables (per CPU) ● Packet mangling ● Redirect to other device ● Tunnel metadata (encapsula"on) ● Cgroups integra"on ● Event no"8ca"ons via perf ring buJer
  • 26. Kernel Userspace XDP – Express Data Path Source Code Byte Code LLVM/clang Sockets Netdevice Network Stack VeriIer + JIT add eax,edx shl eax,2 Driver Access to DMA buLer
  • 27. Q&A Image Sources: ● Cover (Toronto) Rick Harris (h,ps://www.<ickr.com/photos/rickharris/) ● The Invisible Man Dr. Azzacov (h,ps://www.<ickr.com/photos/drazzacov/) ● Chicken JOHN LLOYD (h,ps://www.<ickr.com/photos/hugo90/) Learn more about networking with BPF: Fast IPv6-only Networking for Containers Based on BPF and XDP Wednesday August 24, 2016 4:35pm – 5:35pm, Queen's Quay Contact: ● Twi/er: @tgraf__ Mail: [email protected]