Kubernetes

[ Kans 3 Study - 8w ] 2. Direct Server Return (DSR) / Network Policy (L3, L4, L7) / Bandwidth Manager / L2 Announcements / L2 Aware LB (Beta)

su''@ 2024. 10. 27. 05:05
CloudNetaStudy - Kubernets Networtk 3기 실습 스터디 게시글입니다.

 

6. Direct Server Return (DSR)
  • DSR (Direct Server Return) 소개 - 참고링크
  • 기본 SNAT 사용 시 : kube-proxy 환경에서 NodePort 접속 시
    https://arthurchiao.art/blog/k8s-l4lb/
  • DSR 사용 시
    https://arthurchiao.art/blog/k8s-l4lb/
  • Cilium XDP DSR with IPIP Encapsulation and L4 DNAT
    https://cilium.io/blog/2021/05/20/cilium-110
  • DSR 설정 - 링크
    • 반드시 Tunnel mode 가 Disabled → 즉 Native Routing 모드에서만 DSR 동작함
    • Ubuntu focal64 LTS(Linux 5.4.0-99-generic) 는 DSR 동작 불가
      • TCP 는 DSR, UDP 는 SNAT 으로 동작하는 Hybrid DSR 모드도 있음.
# DSR 모드 확인 
cilium config view | grep dsr
bpf-lb-mode                                    dsr

c0 status --verbose | grep 'KubeProxyReplacement Details:' -A7
KubeProxyReplacement Details:
  Status:                Strict
  Socket LB Protocols:   TCP, UDP
  Devices:               enp0s3 10.0.2.15, enp0s8 192.168.10.10 (Direct Routing)
  Mode:                  DSR
  Backend Selection:     Random
  Session Affinity:      Enabled
  XDP Acceleration:      Disabled
  • 파드와 서비스 생성
    cat <<EOF | kubectl create -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: netpod
      labels:
        app: netpod
    spec:
      nodeName: k8s-m
      containers:
      - name: netshoot-pod
        image: nicolaka/netshoot
        command: ["tail"]
        args: ["-f", "/dev/null"]
      terminationGracePeriodSeconds: 0
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: webpod1
      labels:
        app: webpod
    spec:
      nodeName: k8s-w1
      containers:
      - name: container
        image: traefik/whoami
      terminationGracePeriodSeconds: 0
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: svc1
    spec:
      ports:
        - name: svc1-webport
          port: 80
          targetPort: 80
      selector:
        app: webpod
      type: NodePort
    EOF
    
    ====================
    
    # 파드 서비스 정보
    kubectl get pod -o wide
    kubectl get svc svc1
    
    # map list 확인
    c0 map list --verbose
    c1 map list --verbose
  • 동작 : haruband(한글) - 링크 Magic!*
    https://arthurchiao.art/blog/k8s-l4lb/
    • DSR 사용 시 ip 옵션 헤더('목적지포트 + ? + 목적지IP' )에 최초 접속한 노드의 ip 정보를 담아서 타겟 노드로 전달한다.
    • 타켓 노드는 전달받은 IP 옵션 헤더에 저장된 원래의 목적지 주소를 cilium_snat_v4_external 맵에 저장해놓고, conntrack 맵에 변환 정보를 추가한다.
    • 이후 파드가 응답 패킷을 만들어 전송하면 파드의 veth LXC의 bpf_lxc.c #from-container 에서, conntrack 맵에 저장된 정보를 이용하여 cilium_snat_v4_external 맵에서 필요한 정보를 가져와서 응답 패킷의 출발지 주소를 원래 클라이언트가 접속한 주소로 변환한다. 이러한 과정을 통해 클라이언트는 패킷을 보낸 주소 그대로 패킷을 받게 되는 것이다. Magic!
    • 또한, 클라이언트 IP(접속자 IP)는 파드까지 보존되어서 도달한다.
    • # 서비스(NodePort)를 확인
      kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}';echo
      32465
      
      ------------------------------------------------------------------------------
      # k8s-pc 나 자신의 PC 에서 Service(NodePort) 접속 트래픽 발생
      SVCNPORT=$(kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}')
      curl -s k8s-m:$SVCNPORT
      curl -s k8s-m:$SVCNPORT | grep Hostname
      
      # k8s-pc 나 자신의 PC 에서 지속적으로 접속 트래픽 발생
      while true; do curl -s k8s-m:$SVCNPORT | grep Hostname;echo "-----";sleep 1;done
      ------------------------------------------------------------------------------
      
      # 모니터링 : 워커노드에 최초 TCP SYN 에 IP옵션 헤더값 포함된 패킷 확인! >> 아래 처럼 출력된 IPv4Option 에 맨 뒤부터 읽으면 된다!
      c0 monitor # 없다!
      c1 monitor -vv
      ------------------------------------------------------------------------------
      level=info msg="Initializing dissection cache..." subsys=monitor
      Ethernet	{Contents=[..14..] Payload=[..70..] SrcMAC=82:dc:d0:66:f2:30 DstMAC=6a:f1:5e:fd:53:8b EthernetType=IPv4 Length=0}
      IPv4	{Contents=[..28..] Payload=[..40..] Version=4 IHL=7 TOS=0 Length=68 Id=28665 Flags=DF FragOffset=0 TTL=62 Protocol=TCP Checksum=47859 SrcIP=192.168.20.100 DstIP=172.16.1.97 Options=[IPv4Option(154:[64 118 10 10 168 192])] Padding=[]}
      TCP	{Contents=[..40..] Payload=[] SrcPort=59364 DstPort=80(http) Seq=2356875422 Ack=0 DataOffset=10 FIN=false SYN=true RST=false PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=64240 Checksum=21511 Urgent=0 Options=[..5..] Padding=[]}
      CPU 00: MARK 0x0 FROM 2375 to-endpoint: 82 bytes (82 captured), state new, interface lxc8c5f8079ec7eidentity world->29901, orig-ip 192.168.20.100, to endpoint 2375
      ------------------------------------------------------------------------------
      
      # 패킷 덤프 - 마스터노드
      NODEPORT=$(kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}')
      echo $NODEPORT
      tcpdump -eni any tcp port 80 or tcp port $NODEPORT -q -v
      	13:56:05.057646  In 08:00:27:06:8a:0d (tos 0x0, ttl 63, id 34911, offset 0, flags [DF], proto TCP (6), length 60)
      	    192.168.20.100.59340 > 192.168.10.10.30272: tcp 0
      	13:56:05.057670 Out 08:00:27:cb:61:e9 (tos 0x0, ttl 63, id 34911, offset 0, flags [DF], proto TCP (6), length 68, options (unknown 154))
      	    192.168.20.100.59340 > 172.16.1.97.80: tcp 0
      tcpdump -i any tcp port 80 or tcp port $NODEPORT -w /tmp/dsr1.pcap
      
      # 패킷 덤프 - 워커노드
      NODEPORT=$(kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}')
      tcpdump -eni any tcp port 80 or tcp port $NODEPORT -q -v
      tcpdump -i any tcp port 80 or tcp port $NODEPORT -w /tmp/dsr2.pcap
      (혹은) tcpdump -eni any tcp port 80 or tcp port $NODEPORT -q -v -X
      23:05:55.934635 Out 08:00:27:96:81:a2 (tos 0x0, ttl 63, id 7544, offset 0, flags [DF], proto TCP (6), length 68, options (unknown 154))
          192.168.20.100.35484 > 172.16.1.35.80: tcp 0
      	0x0000:  4700 0044 1d78 4000 3f06 1db0 c0a8 1464  G..D.x@.?......d
      	0x0010:  ac10 0123 9a08 3b79 0a0a a8c0 8a9c 0050  ...#../y.......P
      	0x0020:  8eb3 9b9b 0000 0000 a002 faf0 7420 0000  ............t...
      	0x0030:  0204 05b4 0402 080a 0670 9b04 0000 0000  .........p......
      	0x0040:  0103 0306                                ....
      
      # 옵션 헤더의 Hex 에서 정보 추출
      Hex='0a 0a a8 c0'
      for perIP in $Hex; do echo $((0x${perIP})); done | xargs echo | sed 's/ /./g'
      10.10.168.192 => 꺼꾸로 보자! 클라이언트가 최초 접근 노드 IP (192.168.10.10)
      
      2f79 -> (Hex순서뒤집기) 792f -> (10진수로변환) 31023
      # 16진수 -> 10진수 변환
      A=792F
      echo $((0x${A}))
      31023 >> 클라이언트가 최초 접근 노드의 Port
      
      9a08 -> (Hex순서뒤집기) 089a -> (10진수로변환) 2222
      A=089a
      echo $((0x${A}))
      2202  >> 요건 머지? Endpoint ID 인가? 32bit 배수가 되기 위한 패딩인가? 확인해보자!
      
      # 실습 완료 후 삭제
      kubectl delete svc svc1 && kubectl delete pod webpod1 netpod

      최초 접속 노드(마스터)에는 외부클라이언트가 인입되는 형태의 트래픽만 보인다!
      참고 : 전통적인 L4 장비에서 L2 DSR 동작 - 서버에 Loopback 설정이 필요했음
      • # 파드 서비스 정보 kubectl get pod -o wide kubectl get svc svc1 # map list 확인 c0 map list --verbose c1 map list --verbose
      • cat <<EOF | kubectl create -f - apiVersion: v1 kind: Pod metadata: name: netpod labels: app: netpod spec: nodeName: k8s-m containers: - name: netshoot-pod image: nicolaka/netshoot command: ["tail"] args: ["-f", "/dev/null"] terminationGracePeriodSeconds: 0 --- apiVersion: v1 kind: Pod metadata: name: webpod1 labels: app: webpod spec: nodeName: k8s-w1 containers: - name: container image: traefik/whoami terminationGracePeriodSeconds: 0 --- apiVersion: v1 kind: Service metadata: name: svc1 spec: ports: - name: svc1-webport port: 80 targetPort: 80 selector: app: webpod type: NodePort EOF
      • 동작 : haruband(한글) - 링크 Magic!*https://arthurchiao.art/blog/k8s-l4lb/
        • DSR 사용 시 ip 옵션 헤더('목적지포트 + ? + 목적지IP' )에 최초 접속한 노드의 ip 정보를 담아서 타겟 노드로 전달한다.
        • 타켓 노드는 전달받은 IP 옵션 헤더에 저장된 원래의 목적지 주소를 cilium_snat_v4_external 맵에 저장해놓고, conntrack 맵에 변환 정보를 추가한다.
        • 이후 파드가 응답 패킷을 만들어 전송하면 파드의 veth LXC의 bpf_lxc.c #from-container 에서, conntrack 맵에 저장된 정보를 이용하여 cilium_snat_v4_external 맵에서 필요한 정보를 가져와서 응답 패킷의 출발지 주소를 원래 클라이언트가 접속한 주소로 변환한다. 이러한 과정을 통해 클라이언트는 패킷을 보낸 주소 그대로 패킷을 받게 되는 것이다. Magic!
        • 또한, 클라이언트 IP(접속자 IP)는 파드까지 보존되어서 도달한다.
        # 서비스(NodePort)를 확인
        kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}';echo
        32465
        
        ------------------------------------------------------------------------------
        # k8s-pc 나 자신의 PC 에서 Service(NodePort) 접속 트래픽 발생
        SVCNPORT=$(kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}')
        curl -s **k8s-m**:$SVCNPORT
        curl -s **k8s-m**:$SVCNPORT | grep Hostname
        
        # k8s-pc 나 자신의 PC 에서 지속적으로 접속 트래픽 발생
        while true; do curl -s **k8s-m**:$SVCNPORT | grep Hostname;echo "-----";sleep 1;done
        ------------------------------------------------------------------------------
        
        # 모니터링 : 워커노드에 최초 TCP SYN 에 IP옵션 헤더값 포함된 패킷 확인! >> 아래 처럼 출력된 **IPv4Option** 에 맨 뒤부터 읽으면 된다!
        c0 monitor # 없다!
        **c1 monitor -vv**
        ------------------------------------------------------------------------------
        level=info msg="Initializing dissection cache..." subsys=monitor
        Ethernet	{Contents=[..14..] Payload=[..70..] SrcMAC=82:dc:d0:66:f2:30 DstMAC=6a:f1:5e:fd:53:8b EthernetType=IPv4 Length=0}
        **IPv4**	{Contents=[..28..] Payload=[..40..] Version=4 IHL=7 TOS=0 Length=68 Id=28665 Flags=DF FragOffset=0 TTL=62 Protocol=TCP Checksum=47859 SrcIP=192.168.20.100 DstIP=172.16.1.97 Options=[**IPv4Option**(154:[64 118 **10 10 168 192**])] Padding=[]}
        **TCP**	{Contents=[..40..] Payload=[] SrcPort=59364 DstPort=80(http) Seq=2356875422 Ack=0 DataOffset=10 FIN=false SYN=true RST=false PSH=false ACK=false URG=false ECE=false CWR=false NS=false Window=64240 Checksum=21511 Urgent=0 Options=[..5..] Padding=[]}
        CPU 00: MARK 0x0 FROM 2375 to-endpoint: 82 bytes (82 captured), state new, interface lxc8c5f8079ec7eidentity world->29901, **orig-ip 192.168.20.100**, to endpoint 2375
        ------------------------------------------------------------------------------
        
        # 패킷 덤프 - 마스터노드
        NODEPORT=$(kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}')
        echo $NODEPORT
        tcpdump -eni any tcp port 80 or tcp port $NODEPORT -q -v
        	13:56:05.057646  In 08:00:27:06:8a:0d (tos 0x0, ttl 63, id 34911, offset 0, flags [DF], proto TCP (6), length 60)
        	    192.168.20.100.59340 > 192.168.10.10.30272: tcp 0
        	13:56:05.057670 Out 08:00:27:cb:61:e9 (tos 0x0, ttl 63, id 34911, offset 0, flags [DF], proto TCP (6), length 68, **options (unknown 154))**
        	    192.168.20.100.59340 > 172.16.1.97.80: tcp 0
        tcpdump -i any tcp port 80 or tcp port $NODEPORT -w /tmp/dsr1.pcap
        
        # 패킷 덤프 - 워커노드
        NODEPORT=$(kubectl get svc svc1 -o jsonpath='{.spec.ports[0].nodePort}')
        tcpdump -eni any tcp port 80 or tcp port $NODEPORT -q -v
        tcpdump -i any tcp port 80 or tcp port $NODEPORT -w /tmp/dsr2.pcap
        (혹은) tcpdump -eni any tcp port 80 or tcp port $NODEPORT -q -v -X
        23:05:55.934635 Out 08:00:27:96:81:a2 (tos 0x0, ttl 63, id 7544, offset 0, flags [DF], proto TCP (6), length 68, **options (unknown 154)**)
            192.168.20.100.35484 > 172.16.1.35.80: tcp 0
        	0x0000:  4700 0044 1d78 4000 3f06 1db0 c0a8 1464  G..D.x@.?......d
        	0x0010:  ac10 0123 **9a08 3b79 0a0a a8c0** 8a9c 0050  ...#../y.......P
        	0x0020:  8eb3 9b9b 0000 0000 a002 faf0 7420 0000  ............t...
        	0x0030:  0204 05b4 0402 080a 0670 9b04 0000 0000  .........p......
        	0x0040:  0103 0306                                ....
        
        **# 옵션 헤더의 Hex 에서 정보 추출**
        Hex='**0a 0a a8 c0**'
        for perIP in $Hex; do echo $((0x${perIP})); done | xargs echo | sed 's/ /./g'
        10.10.168.192 => 꺼꾸로 보자! 클라이언트가 최초 접근 노드 IP (192.168.10.10)
        
        **2f79** -> (Hex순서뒤집기) **792f** -> (10진수로변환) **31023**
        # 16진수 -> 10진수 변환
        **A**=**792F**
        echo $((0x${**A**}))
        31023 >> 클라이언트가 최초 접근 노드의 Port
        
        9a08 -> (Hex순서뒤집기) 089a -> (10진수로변환) 2222
        **A**=089a
        echo $((0x${**A**}))
        2202  >> 요건 머지? Endpoint ID 인가? 32bit 배수가 되기 위한 패딩인가? 확인해보자!
        
        # 실습 완료 후 삭제
        kubectl delete svc svc1 && kubectl delete pod webpod1 netpod
        
        최초 접속 노드(마스터)에는 외부클라이언트가 인입되는 형태의 트래픽만 보인다!

      • 참고 : 전통적인 L4 장비에서 L2 DSR 동작 - 서버에 Loopback 설정이 필요했음

 

7. Network Policy (L3, L4, L7)
  • Security
    • Securing Networks with Cilium - Link
      • Identity-Aware and HTTP-Aware Policy Enforcement - Link
      • Locking Down External Access with DNS-Based Policies - Link
      • Inspecting TLS Encrypted Connections with Cilium - Link
      • Securing a Kafka Cluster - Link
      • Securing gRPC - Link
      • Securing Elasticsearch - Link
      • Securing a Cassandra Database - Link
      • Securing Memcached - Link
      • Locking Down External Access Using AWS Metadata - Link
      • Creating Policies from Verdicts - Link
      • Host Firewall - Link
      • Restricting privileged Cilium pod access - Link
    • Overview of Network Security - Link
      • Intro - Link
      • Identity-Based - Link
      • Policy Enforcement - Link
      • Proxy Injection - Link
      • Transparent Encryption - Link
        • IPsec Transparent Encryption - Link
        • WireGuard Transparent Encryption - Link
    • Overview of Network Policy - Link
      • Policy Enforcement Modes - Link
      • Layer 3 Examples - Link
      • Using Kubernetes Constructs In Policy - Link
      • Endpoint Lifecycle - Link
      • Troubleshooting - Link
      • Caveats - Link
    • Threat Model - Link
    • Cilium Security Intro : Cilium provides security on multiple levels - Docs
      • ID 기반 Identity-Based: Connectivity policies between endpoints (Layer 3), e.g. any endpoint with label role=frontend can connect to any endpoint with label role=backend.https://docs.cilium.io/en/stable/security/network/identity/
        https://docs.cilium.io/en/stable/security/network/identity/
      • 포트 기반 Restriction of accessible ports (Layer 4) for both incoming and outgoing connections, e.g. endpoint with label role=frontend can only make outgoing connections on port 443 (https) and endpoint role=backend can only accept connections on port 443 (https).
      • 애플리케이션 (HTTP)기반 Fine grained access control on application protocol level to secure HTTP and remote procedure call (RPC) protocols, e.g the endpoint with label role=frontend can only perform the REST API call GET /userdata/[0-9]+, all other API interactions with role=backend are restricted.
          • Proxy Injection : Envoy - Docs , Envoy
            • Cilium is capable of transparently injecting a Layer 4 proxy into any network connection. This is used as the foundation to enforce higher level network policies (see DNS based and Layer 7 Examples).
            https://docs.cilium.io/en/stable/security/network/proxy/envoy/
    • Network Policy 관련 eBPF Datapath
      • Prefilter: An XDP program and provides a set of prefilter rules used to filter traffic from the network for best performance.
      • Endpoint Policy: 정책에 따라 패킷을 차단/전달하거나, 서비스로 전달하거나, L7 로 정책 전달 할 수 있다.
        • the Cilium datapath responsible for mapping packets to identities and enforcing L3 and L4 policies.
      • L7 Policy: The L7 Policy object redirect proxy traffic to a Cilium userspace proxy instance. Cilium uses an Envoy instance as its userspace proxy. Envoy will then either forward the traffic or generate appropriate reject messages based on the configured L7 policy.
      • → L7 정책는 커널 hookpoint  Userspace Proxy 사용으로 성능이 조금 떨어질 수 있다
      • L3 Encryption, Socket Layer Enforcement : skip~
    • Deploy the Demo Application - Docs
      • 스타워즈에서 영감 받은 예제 : 디플로이먼트(웹 서버, deathstar, replicas 2), 파드(xwing, tiefighter), 서비스(ClusterIP, service/deathstar)
        # 배포
        kubectl create -f https://raw.githubusercontent.com/cilium/cilium/1.16.3/examples/minikube/http-sw-app.yaml
        kubectl get all
        
        # 파드 라벨 확인
        kubectl get pod --show-labels
        NAME                         READY   STATUS    RESTARTS   AGE    LABELS
        deathstar-689f66b57d-4rwkf   1/1     Running   0          113s   app.kubernetes.io/name=deathstar,class=deathstar,org=empire,pod-template-hash=689f66b57d
        deathstar-689f66b57d-8p2l5   1/1     Running   0          113s   app.kubernetes.io/name=deathstar,class=deathstar,org=empire,pod-template-hash=689f66b57d
        tiefighter                   1/1     Running   0          113s   app.kubernetes.io/name=tiefighter,class=tiefighter,org=empire
        xwing                        1/1     Running   0          113s   app.kubernetes.io/name=xwing,class=xwing,org=alliance
        
        # cilium endpoint 확인
        kubectl get ciliumendpoints
        c1 endpoint list
        c2 endpoint list
        
        # 데스스타 SVC(ClusterIP) 접속하여 웹 파드 연결 확인 >> Hubble UI 에서 실시간 확인해보자!
        kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
        Ship landed
        
        kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
        Ship landed
        
        # 확인
        hubble observe
    • Identity-Aware and HTTP-Aware Policy Enforcement Apply an L3/L4 Policy - Link & Hubble CLI - 링크
      • Cilium 에서는 Endpoint IP 대신, 파드의 **Labels(라벨)**을 사용(기준)하여 보안 정책을 적용합니다
      • IP/Port 필터링을 L3/L4 네트워크 정책이라고 한다
      • 아래 처럼 'org=empire' Labels(라벨) 부착된 파드만 허용해보자
      • Cilium performs stateful connection tracking 이므로 리턴 트래픽은 자동으로 허용됨
        # L3/L4 정책 생성
        cat <<EOF | kubectl apply -f - 
        apiVersion: "cilium.io/v2"
        kind: CiliumNetworkPolicy
        metadata:
          name: "rule1"
        spec:
          description: "L3-L4 policy to restrict deathstar access to empire ships only"
          endpointSelector:
            matchLabels:
              org: empire
              class: deathstar
          ingress:
          - fromEndpoints:
            - matchLabels:
                org: empire
            toPorts:
            - ports:
              - port: "80"
                protocol: TCP
        EOF
        
        # 정책 확인
        kubectl get cnp
        kc describe cnp rule1
        c0 policy get
        
        
        # 파드 curl 접속 시도 시 파드 sh 접속 후 curl 시도하자!
        # 데스스타 SVC(ClusterIP) 접속하여 웹 파드 연결 확인 >> Hubble UI 에서 drop 확인!
        kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
        Ship landed
        
        kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
        drop
        
        # hubble cli 모니터링 
        hubble observe --pod xwing
        hubble observe --pod tiefighter
        hubble observe --pod deathstar
        Dec  2 05:36:24.490: default/xwing:55464 <> default/deathstar-c74d84667-t7msh:80 Policy denied DROPPED (TCP Flags: SYN)
        Dec  2 05:36:24.490: default/xwing:55464 <> default/deathstar-c74d84667-t7msh:80 Policy denied DROPPED (TCP Flags: SYN)
        
        hubble observe --pod deathstar --verdict DROPPED
        Nov 30 15:23:47.721: default/xwing:60086 <> default/deathstar-c74d84667-ksnbd:80 Policy denied DROPPED (TCP Flags: SYN)
        Nov 30 15:23:47.721: default/xwing:60086 <> default/deathstar-c74d84667-ksnbd:80 Policy denied DROPPED (TCP Flags: SYN)
        Nov 30 15:27:40.250: default/tiefighter:41656 -> default/deathstar-c74d84667-ksnbd:80 http-request DROPPED (HTTP/1.1 PUT http://deathstar.default.svc.cluster.local/v1/exhaust-port)
        Nov 30 15:28:00.707: default/tiefighter:41666 -> default/deathstar-c74d84667-ksnbd:80 http-request DROPPED (HTTP/1.1 PUT http://deathstar.default.svc.cluster.local/v1/exhaust-port)
        
        Inspecting the Policy
        # If we run cilium endpoint list again we will see that the pods with the label org=empire and class=deathstar
        # now have ingress policy enforcement enabled as per the policy above.
        
        # endpoint list 에서 정책 적용 확인
        c1 endpoint list | grep deathstar
        c2 endpoint list
        ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                              IPv6   IPv4           STATUS
                   ENFORCEMENT        ENFORCEMENT
        312        Disabled           Disabled          18300      k8s:class=xwing                                                                 172.16.2.161   ready
                                                                   k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                                   k8s:io.cilium.k8s.policy.cluster=default
                                                                   k8s:io.cilium.k8s.policy.serviceaccount=default
                                                                   k8s:io.kubernetes.pod.namespace=default
                                                                   k8s:org=alliance
        
        1972       Enabled            Disabled          21144      k8s:class=deathstar                                                             172.16.2.66    ready
                                                                   k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                                   k8s:io.cilium.k8s.policy.cluster=default
                                                                   k8s:io.cilium.k8s.policy.serviceaccount=default
                                                                   k8s:io.kubernetes.pod.namespace=default
                                                                   k8s:org=empirec2 endpoint list
  • Identity-Aware and HTTP-Aware Policy Enforcement Apply and Test HTTP-aware L7 Policy - Docs  
    • HTTP L7 필터링을 적용 : 아래 처럼 PUT /v1/exhaust-port 요청을 차단!
      # 데스스타 SVC(ClusterIP) 접속
      kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
      Panic: deathstar exploded
      ...
      
      # POST /v1/request-landing API 호출만 허용 정책으로 기존 정책 내용을 업데이트(configured)!
      cat <<EOF | kubectl apply -f - 
      apiVersion: "cilium.io/v2"
      kind: CiliumNetworkPolicy
      metadata:
        name: "rule1"
      spec:
        description: "L7 policy to restrict access to specific HTTP call"
        endpointSelector:
          matchLabels:
            org: empire
            class: deathstar
        ingress:
        - fromEndpoints:
          - matchLabels:
              org: empire
          toPorts:
          - ports:
            - port: "80"
              protocol: TCP
            rules:
              http:
              - method: "POST"
                path: "/v1/request-landing"
      EOF
      
      # 정책 확인
      kc describe ciliumnetworkpolicies
      c0 policy get
      
      # 모니터링
      c1 monitor -v --type l7
      c2 monitor -v --type l7
      <- Request http from 0 ([k8s:io.cilium.k8s.policy.cluster=default k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.kubernetes.pod.namespace=default k8s:org=empire k8s:class=tiefighter k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default]) to 1972 ([k8s:class=deathstar k8s:org=empire k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default k8s:io.kubernetes.pod.namespace=default k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.cilium.k8s.policy.cluster=default]), identity 42720->21144, verdict Denied PUT http://deathstar.default.svc.cluster.local/v1/exhaust-port => 403
       => 403
      hubble observe --pod deathstar
      hubble observe --pod deathstar --verdict DROPPED
      
      
      # 접근 테스트
      kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
      Ship landed
      
      kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
      Access denied
      
      ## hubble cli 에 차단 로그 확인
      hubble observe --pod deathstar --verdict DROPPED
      Feb 28 11:39:59.078: default/tiefighter:33762 -> default/deathstar-c74d84667-lf2wl:80 http-request DROPPED (HTTP/1.1 PUT http://deathstar.default.svc.cluster.local/v1/exhaust-port)
      
      hubble observe --pod deathstar --protocol http
      Feb 28 12:05:22.095: default/tiefighter:40428 -> default/deathstar-6f87496b94-cvv9r:80 http-request DROPPED (HTTP/1.1 PUT http://deathstar.default.svc.cluster.local/v1/exhaust-port)
      
      # 삭제
      kubectl delete -f https://raw.githubusercontent.com/cilium/cilium/1.16.3/examples/minikube/http-sw-app.yaml
      kubectl delete cnp rule1

  • [도전과제2] 공식문서 Securing Networks with Cilium 에 다양한 통제 기능을 테스트 후 정리 - Link

 

8. Bandwidth Manager
  • Bandwidth Manager : Bandwidth and Latency Optimization - Link , Home , Youtube
    https://cilium.io/use-cases/bandwidth-optimization/
  • bandwidth manager to optimize TCP and UDP workloads and efficiently rate limit individual Pods - EDT((Earliest Departure Time) 와 eBPF 사용
    • kubernetes.io/egress-bandwidth Pod annotation which is enforced on egress at the native host networking devices.
    • ~~kubernetes.io/ingress-bandwidth~~ annotation is not supported
    • direct routing mode, tunneling mode 둘 다 지원
    • Limitations : L7 Cilium Network Policies
  • 설정 및 확인
    # 인터페이스 tc qdisc 확인
    tc qdisc show dev ens5
    qdisc mq 0: root 
    qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
    qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
    qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
    qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
    
    # 설정
    helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values --set bandwidthManager.enabled=true
    
    # 적용 확인
    cilium config view | grep bandwidth
    enable-bandwidth-manager                       true
    
    # egress bandwidth limitation 동작하는 인터페이스 확인
    c0 status | grep  BandwidthManager
    BandwidthManager:        EDT with BPF [CUBIC] [ens5]
    
    # 인터페이스 tc qdisc 확인 : 설정 전후 옵션값들이 상당히 추가된다
    tc qdisc
    tc qdisc show dev ens5
    qdisc mq 8002: root 
    qdisc fq 8005: parent 8002:2 limit 10000p flow_limit 100p buckets 32768 orphan_mask 1023 quantum 18030b initial_quantum 90150b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 2s horizon_drop 
    qdisc fq 8003: parent 8002:4 limit 10000p flow_limit 100p buckets 32768 orphan_mask 1023 quantum 18030b initial_quantum 90150b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 2s horizon_drop 
    qdisc fq 8004: parent 8002:3 limit 10000p flow_limit 100p buckets 32768 orphan_mask 1023 quantum 18030b initial_quantum 90150b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 2s horizon_drop 
    qdisc fq 8006: parent 8002:1 limit 10000p flow_limit 100p buckets 32768 orphan_mask 1023 quantum 18030b initial_quantum 90150b low_rate_threshold 550Kbit refill_delay 40ms timer_slack 10us horizon 2s horizon_drop
     
  • 동작 및 확인
  • # 테스트를 위한 트래픽 발생 서버/클라이언트 파드 생성 cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: Pod metadata: annotations: # Limits egress bandwidth to 10Mbit/s. kubernetes.io/egress-bandwidth: "10M" labels: # This pod will act as server. app.kubernetes.io/name: netperf-server name: netperf-server spec: containers: - name: netperf image: cilium/netperf ports: - containerPort: 12865 --- apiVersion: v1 kind: Pod metadata: # This Pod will act as client. name: netperf-client spec: affinity: # Prevents the client from being scheduled to the # same node as the server. podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - netperf-server topologyKey: kubernetes.io/hostname containers: - name: netperf args: - sleep - infinity image: cilium/netperf EOF # egress BW 제한 정보 확인 kubectl describe pod netperf-server | grep Annotations: Annotations: kubernetes.io/egress-bandwidth: 10M # egress BW 제한이 설정된 파드가 있는 cilium pod 에서 제한 정보 확인 c1 bpf bandwidth list c2 bpf bandwidth list IDENTITY EGRESS BANDWIDTH (BitsPerSec) 904 10M c1 endpoint list c2 endpoint list ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS ENFORCEMENT ENFORCEMENT 904 Disabled Disabled 21565 k8s:app.kubernetes.io/name=netperf-server 172.16.2.153 ready # 트래픽 발생 >> Hubble UI 에서 확인 # egress traffic of the netperf-server Pod has been limited to 10Mbit per second. NETPERF_SERVER_IP=$(kubectl get pod netperf-server -o jsonpath='{.status.podIP}') kubectl exec netperf-client -- netperf -t TCP_MAERTS -H "${NETPERF_SERVER_IP}" Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 10.00 9.54 # 10Mbps 제한 확인! # 5M 제한 설정 후 테스트 kubectl get pod netperf-server -o json | sed -e 's|10M|5M|g' | kubectl apply -f - c1 bpf bandwidth list c2 bpf bandwidth list kubectl exec netperf-client -- netperf -t TCP_MAERTS -H "${NETPERF_SERVER_IP}" Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 10.09 4.56 # 4.5Mbps 제한 확인! # 20M 제한 설정 후 테스트 kubectl get pod netperf-server -o json | sed -e 's|5M|20M|g' | kubectl apply -f - kubectl exec netperf-client -- netperf -t TCP_MAERTS -H "${NETPERF_SERVER_IP}" Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 10.00 18.95 # 19Mbps 제한 확인! tc qdisc show dev ens5 # 삭제 kubectl delete pod netperf-client netperf-server
9. L2 Announcements / L2 Aware LB (Beta) - Link
  • L2 Announcements / L2 Aware LB (Beta) - Link , Blog
    • L2 Announcements는 로컬 영역 네트워크에서 서비스를 표시하고 도달 가능하게 만드는 기능입니다. 이 기능은 주로 사무실 또는 캠퍼스 네트워크와 같이 BGP 기반 라우팅이 없는 네트워크 내에서 온프레미스 배포를 위해 고안되었습니다.
    • 이 기능을 사용하면 ExternalIP 및/또는 LoadBalancer IP에 대한 ARP 쿼리에 응답합니다. 이러한 IP는 여러 노드의 가상 IP(네트워크 장치에 설치되지 않음)이므로 각 서비스에 대해 한 번에 한 노드가 ARP 쿼리에 응답하고 MAC 주소로 응답합니다. 이 노드는 서비스 로드 밸런싱 기능으로 로드 밸런싱을 수행하여 북쪽/남쪽 로드 밸런서 역할을 합니다.
    • NodePort 서비스에 비해 이 기능의 장점은 각 서비스가 고유한 IP를 사용할 수 있으므로 여러 서비스가 동일한 포트 번호를 사용할 수 있다는 것입니다. NodePort를 사용할 때 트래픽을 보낼 호스트를 결정하는 것은 클라이언트에게 달려 있으며 노드가 다운되면 IP+Port 콤보를 사용할 수 없게 됩니다. L2 공지를 사용하면 서비스 VIP가 다른 노드로 간단히 마이그레이션되고 계속 작동합니다.
      https://isovalent.com/blog/post/migrating-from-metallb-to-cilium/
  • 설정 및 확인
    #
    helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values \
    --set l2announcements.enabled=true --set externalIPs.enabled=true \
    --set l2announcements.leaseDuration=3s --set l2announcements.leaseRenewDeadline=1s --set l2announcements.leaseRetryPeriod=200ms
     
    #
    c0 config --all  |grep L2
    EnableL2Announcements             : true
    EnableL2NeighDiscovery            : true
    
    # CiliumL2AnnouncementPolicy 생성
    cat <<EOF | kubectl apply -f - 
    apiVersion: "cilium.io/v2alpha1"
    kind: CiliumL2AnnouncementPolicy
    metadata:
      name: policy1
    spec:
      serviceSelector:
        matchLabels:
          color: blue
      nodeSelector:
        matchExpressions:
          - key: node-role.kubernetes.io/control-plane
            operator: DoesNotExist
      interfaces:
      - ^ens[0-9]+
      externalIPs: true
      loadBalancerIPs: true
    EOF
    
    # 확인
    kubectl get ciliuml2announcementpolicy
    kc describe l2announcement
    
    #
    cat <<EOF | kubectl apply -f - 
    apiVersion: "cilium.io/v2alpha1"
    kind: CiliumLoadBalancerIPPool
    metadata:
      name: "cilium-pool"
    spec:
      allowFirstLastIPs: "No"
      blocks:
      - cidr: "10.10.200.0/29"
    EOF
    
    # cilium ip pool 조회
    kubectl get CiliumLoadBalancerIPPool
    NAME          DISABLED   CONFLICTING   IPS AVAILABLE   AGE
    cilium-pool   false      False         3               3m5s

  • 테스트용 파드, 서비스 생성
    #
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: webpod1
      labels:
        app: webpod
    spec:
      nodeName: k8s-w1
      containers:
      - name: container
        image: traefik/whoami
      terminationGracePeriodSeconds: 0
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: webpod2
      labels:
        app: webpod
    spec:
      nodeName: k8s-w2
      containers:
      - name: container
        image: traefik/whoami
      terminationGracePeriodSeconds: 0
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: svc1
    spec:
      ports:
        - name: svc1-webport
          port: 80
          targetPort: 80
      selector:
        app: webpod
      type: LoadBalancer  # 서비스 타입이 LoadBalancer
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: svc2
    spec:
      ports:
        - name: svc2-webport
          port: 80
          targetPort: 80
      selector:
        app: webpod
      type: LoadBalancer
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: svc3
    spec:
      ports:
        - name: svc3-webport
          port: 80
          targetPort: 80
      selector:
        app: webpod
      type: LoadBalancer
    EOF
  • 접속 확인
    #
    kubectl get svc,ep
    NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
    service/kubernetes   ClusterIP      10.10.0.1       <none>        443/TCP        5h20m
    service/svc1         LoadBalancer   10.10.226.228   10.10.200.1   80:32693/TCP   5m30s
    service/svc2         LoadBalancer   10.10.166.59    10.10.200.2   80:30107/TCP   5m30s
    service/svc3         LoadBalancer   10.10.106.144   10.10.200.3   80:31564/TCP   5m30s
    
    NAME                   ENDPOINTS                        AGE
    endpoints/kubernetes   192.168.10.10:6443               5h20m
    endpoints/svc1         172.16.1.52:80,172.16.2.196:80   5m30s
    endpoints/svc2         172.16.1.52:80,172.16.2.196:80   5m30s
    endpoints/svc3         172.16.1.52:80,172.16.2.196:80   5m30s
    
    #
    curl -s 10.10.200.1
    curl -s 10.10.200.2
    curl -s 10.10.200.3
    
    # 삭제
    ~