<< Back

4. Component Level Architecture

scope

Table of Contents

4.1 Introduction

This chapter describes in detail the Kubernetes Reference Architecture in terms of the functional capabilities and how they relate to the Reference Model requirements, i.e. how the infrastructure profiles are determined, documented and delivered.

The specifications defined in this chapter will be detailed with unique identifiers, which will follow the pattern: ra2.<section>.<index> , e.g. ra2.ch.001 for the first requirement in the Kubernetes Node section. These specifications will then be used as requirements input for the Kubernetes Reference Implementation and any vendor or community implementations.

Figure 4-1 below shows the architectural components that are described in the subsequent sections of this chapter.

Kubernetes Reference Architecture

Figure 4-1: Kubernetes Reference Architecture

4.2 Kubernetes Node

This section describes the configuration that will be applied to the physical or virtual machine and an installed Operating System. In order for a Kubernetes Node to be conformant with the Reference Architecture it must be implemented as per the following specifications:

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

r a2.ch.001

Huge Pages

When hosting workloads matching the Network Intensive profile, it must be possible to enable Huge Pages (2048KiB and 1048576KiB) within the Kubernetes Node OS, exposing schedulable resources huge pages-2Mi and hugep ages-1Gi .

infr a.com.cfg.0 04

4 .3.1

r a2.ch.002

SR-IOV capable NICs

When hosting workloads matching the Network Intensive profile, the physical machines on which the Kubernetes Nodes run must be equipped with NICs that are SR-IOV capable.

e.cap.0 13

3.3

r a2.ch.003

SR-IOV Virtual Functions

When hosting workloads matching the Network Intensive profile, SR-IOV virtual functions (VFs) must be configured within the Kubernetes Node OS, as the SR-IOV Device Plugin does not manage the creation of these VFs.

e.cap.0 13

4 .3.1

r a2.ch.004

CPU S imultaneous Mult i-Threading (SMT)

SMT must be enabled in the BIOS on the physical machine on which the Kubernetes Node runs.

infra.h w.cpu.cfg.0 04

3.3

r a2.ch.005

CPU Allocation Ratio - VMs

For Kubernetes nodes running as Virtual Machines, ensure the CPU allocation ratio between vCPU and physical CPU core is 1:1.

infr a.com.cfg.0 01

r a2.ch.006

CPU Allocation Ratio - Pods

To ensure the CPU allocation ratio between vCPU and physical CPU core is 1:1, the sum of CPU requests and limits by containers in Pod spe cifications must remain less than the allocatable quantity of CPU resources (i.e. ` requests.c pu < alloca table.cpu` and ` ` limits.cpu

< allocata

ble.cpu``).

infr a.com.cfg.0 01

3.3

r a2.ch.007

IP v6DualStack

To support IPv4/IPv6 dual stack networking, the Kubernetes Node OS must support and be allocated routable IPv4 and IPv6 addresses.

req.inf. ntw.04

r a2.ch.008

Physical CPU Quantity

The physical machines on which the Kubernetes Nodes run must be equipped with at least 2 physical sockets, each of at least 20 CPU cores.

infra.hw .cpu.cfg.00 1

` infra.h

w.cpu.cfg.0 02 <./chapt er02.md#224 -cloud-infr astructure- hardware-pr ofile-requi rements>`__

3.3

r a2.ch.009

Physical Storage

The physical machines on which the Kubernetes Nodes run should be equipped with Sold State Drives (SSDs).

` infra.hw.st g.ssd.cfg.0 02 <./chapt er02.md#224 -cloud-infr astructure- hardware-pr ofile-requi rements>`__

3.3

r a2.ch.010

Local Filesystem Storage Quantity

The Kubernetes Nodes must be equipped with local filesystem capacity of at least 320GB for unpacking and executing containers. Note, extra should be provisioned to cater for any overhead required by the Operating System and any required OS processes such as the container runtime, Kubernetes agents, etc.

e.cap.0 03

3.3

r a2.ch.011

Virtual Node CPU Quantity

If using VMs, the Kubernetes Nodes must be equipped with at least 16 vCPUs. Note, extra should be provisioned to cater for any overhead required by the Operating System and any required OS processes such as the container runtime, Kubernetes agents, etc.

e.cap.0 01

r a2.ch.012

Kubernetes Node RAM Quantity

The Kubernetes Nodes must be equipped with at least 32GB of RAM. Note, extra should be provisioned to cater for any overhead required by the Operating System and any required OS processes such as the container runtime, Kubernetes agents, etc.

e.cap.0 02

3.3

r a2.ch.013

Physical NIC Quantity

The physical machines on which the Kubernetes Nodes run must be equipped with at least four (4) Network Interface Card (NIC) ports.

infra.h w.nic.cfg.0 01

3.3

r a2.ch.014

Physical NIC Speed - Basic Profile

The NIC ports housed in the physical machines on which the Kubernetes Nodes run for workloads matching the Basic Profile must be at least 10Gbps.

infra.h w.nic.cfg.0 02

3.3

r a2.ch.015

Physical NIC Speed - Network Intensive Profile

The NIC ports housed in the physical machines on which the Kubernetes Nodes run for workloads matching the Network Intensive profile must be at least 25Gbps.

infra.h w.nic.cfg.0 02

3.3

r a2.ch.016

Physical PCIe slots

The physical machines on which the Kubernetes Nodes run must be equipped with at least eight (8) Gen3.0 PCIe slots, each with at least eight (8) lanes.

r a2.ch.017

Immutable inf rastructure

Whether physical or virtual machines are used, the Kubernetes Node is not changed after it is made ready for use. New changes to the Kubernetes Node are rolled out as new instances. This covers any changes from BIOS through Operating System to running processes and all associated conf igurations.

req.gen. cnt.02

4 .3.1

r a2.ch.018

NFD

Node Feature Di scovery must be used to advertise the detailed software and hardware c apabilities of each node in the Kubernetes Cluster.

TBD

4 .3.1

Table 4-1: Node Specifications

4.3 Kubernetes

In order for the Kubernetes components to be conformant with the Reference Architecture they must be implemented as per the following specifications:

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

ra 2.k8s.001

Kubernetes Conformance

The Kubernetes di stribution, product, or installer used in the imp lementation must be listed in the Kubernetes Di stributions and Platforms document and marked (X) as conformant for the Kubernetes version defined in READM E under “Required versions of most important c omponents”.

req.gen. cnt.03

4 .3.1

ra 2.k8s.002

Highly available etcd

An imp lementation must consist of either three, five or seven nodes running the etcd service (can be colocated on the master nodes, or can run on separate nodes, but not on worker nodes).

req .gen.rsl.02 req.gen. avl.01

4 .3.1

ra 2.k8s.003

Highly available control plane

An imp lementation must consist of at least one master node per a vailability zone or fault domain to ensure the high a vailability and resilience of the Kubernetes control plane services.

req.gen.rs l.02 req.gen. avl.01

ra 2.k8s.012

Control plane services

A master node must run at least the following Kubernetes control plane services: kube-a piserver , kube- scheduler and kube -controller -manager .

req.gen.rs l.02 req.gen. avl.01

4 .3.1

ra 2.k8s.004

Highly available worker nodes

An imp lementation must consist of at least one worker node per a vailability zone or fault domain to ensure the high a vailability and resilience of workloads managed by Kubernetes

` req .gen.rsl.01

<./chapter

02.md#23-ku bernetes-ar chitecture- requirement s>`__ r eq.gen.avl. 01 req.kcm.ge n.02 req.inf. com.01

ra 2.k8s.005

Kubernetes API Version

In alignment with the Kubernetes version support polic y , an imp lementation must use Kubernetes version as per the s ubcomponent versions table in READM E under “Required versions of most important c omponents”.

TBC

ra 2.k8s.006

NUMA Support

When hosting workloads matching the Network Intensive profile, the Topolo gyManager and C PUManager feature gates must be enabled and configured on the kubelet (note, Topo logyManager is enabled by default in Kubernetes v1.18 and later, with CPUManager enabled by default in Kubernetes v1.10 and later). ` –feature- gates=”…, TopologyMan ager=true,C PUManager=t rue” –topo logy-manage r-policy=si ngle-numa-n ode –cpu-m anager-poli cy=static`

e.cap .007 infr a.com.cfg.0 02 infra.h w.cpu.cfg.0 03

ra 2.k8s.007

De vicePlugins Feature Gate

When hosting workloads matching the Network Intensive profile, the De vicePlugins feature gate must be enabled (note, this is enabled by default in Kubernetes v1.10 or later). -- feature-gat es="...,Dev icePlugins= true,..."

Various, e.g. e.cap .013

4 .3.1

ra 2.k8s.008

System Resource R eservations

To avoid resource starvation issues on nodes, the imp lementation of the a rchitecture must reserve compute resources for system daemons and Kubernetes system daemons such as kubelet, container runtime, etc. Use the following kubelet flags: -- reserved-cp us=[a-z] , using two of a-z to reserve 2 SMT threads.

i.cap .014

ra 2.k8s.009

CPU Pinning

When hosting workloads matching the Network Intensive profile, in order to support CPU Pinning, the kubelet must be started with the --cpu-m anager-poli cy=static option. (Note, only containers in G uaranteed pods - where CPU resource ` requests` and limits are identical - and configured with posit ive-integer CPU ` requests` will take advantage of this. All other Pods will run on CPUs in the remaining shared pool.)

infr a.com.cfg.0 03

ra 2.k8s.010

IP v6DualStack

To support IPv6 and IPv4, the IPv6 DualStack feature gate must be enabled on various components (requires Kubernetes v1.16 or later). kube -apiserver: --feat ure-gates=" IPv6DualSta ck=true" . ku be-controll er-manager: --featur e-gates="IP v6DualStack =true" --cl uster-cidr= <IPv4 CIDR> ,<IPv6 CIDR > --service -cluster-ip -range=<IPv 4 CIDR>,<IP v6 CIDR> -- node-cidr-m ask-size-ip v4 ¦ --node -cidr-mask- size-ipv6 defaults to /24 for IPv4 and /64 for IPv6. kubelet: --feat ure-gates=" IPv6DualSta ck=true" . kube-proxy: `` –cluster-c idr=<IPv4 C IDR>,<IPv6 CIDR> –fea ture-gates= “IPv6DualSt ack=true”``

req.inf. ntw.04

ra 2.k8s.011

Anuket profile labels

To clearly identify which worker nodes are compliant with the different profiles defined by Anuket the worker nodes must be labelled according to the following pattern: an anu ket.io/prof ile/basic label must be set to true on the worker node if it can fulfil the r equirements of the basic profile and an anuk et.io/profi le/network- intensive label must be set to true on the worker node if it can fulfil the r equirements of the network intensive profile. The r equirements for both profiles can be found in chapter 2

ra 2.k8s.012

Kubernetes APIs

Kubernetes `Alpha API < https://kub ernetes.io/ docs/refere nce/using-a pi/#api-ver sioning>`__ are recommended only for testing, therefore all Alpha APIs must be disabled.

` req.int.api .03 <./chap ter02.md#22 -reference- model-requi rements>`__

ra 2.k8s.013

Kubernetes APIs

Backward co mpatibility of all supported GA and Beta APIs of Kubernetes must be supported.

` req.int.api .04 <./chap ter02.md#22 -reference- model-requi rements>`__

ra 2.k8s.014

Security Groups

Kubernetes must support Ne tworkPolicy feature.

infra.net .cfg.004

Table 4-2: Kubernetes Specifications

4.4 Container runtimes

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

ra 2.crt.001

Conformance with OCI 1.0 runtime spec

The container runtime must be implemented as per the OCI 1.0 (Open Container Initiative 1.0) spe cification.

req.ge n.ost.01

4 .3.1

ra 2.crt.002

Kubernetes Container Runtime Interface (CRI)

The Kubernetes container runtime must be implemented as per the Kubernetes Container Runtime Interface (CRI )

req.ge n.ost.01

4 .3.1

Table 4-3: Container Runtime Specifications

4.5 Networking solutions

In order for the networking solution(s) to be conformant with the Reference Architecture they must be implemented as per the following specifications:

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

ra 2.ntw.001

Centralised network adm inistration

The networking solution deployed within the imp lementation must be a dministered through the Kubernetes API using native Kubernetes API resources and objects, or Custom Resources.

req.in f.ntw.03

4 .3.1

ra 2.ntw.002

Default Pod Network - CNI

The networking solution deployed within the imp lementation must use a CNI -conformant Network Plugin for the Default Pod Network, as the alternative (kubenet) does not support cross-node networking or Network Policies.

req.ge n.ost.01 req.in f.ntw.08

4 .3.1

ra 2.ntw.003

Multiple connection points

The networking solution deployed within the imp lementation must support the capability to connect at least FIVE connection points to each Pod, which are additional to the default connection point managed by the default Pod network CNI plugin.

e.cap .004

4 .3.1

ra 2.ntw.004

Multiple connection points p resentation

The networking solution deployed within the imp lementation must ensure that all additional non-default connection points are requested by Pods using standard Kubernetes resource scheduling mechanisms such as annotations or container resource requests and limits.

req.in f.ntw.03

4 .3.1

ra 2.ntw.005

M ultiplexer/ meta-plugin

The networking solution deployed within the imp lementation may use a mu ltiplexer/m eta-plugin.

req.in f.ntw.06 req.in f.ntw.07

4 .3.1

ra 2.ntw.006

M ultiplexer/ meta-plugin CNI Conformance

If used, the selected m ultiplexer/ meta-plugin must integrate with the Kubernetes control plane via CNI.

req.ge n.ost.01

4 .3.1

ra 2.ntw.007

M ultiplexer/ meta-plugin CNI Plugins

If used, the selected m ultiplexer/ meta-plugin must support the use of multiple CNI -conformant Network Plugins.

req.ge n.ost.01 req.in f.ntw.06

4 .3.1

ra 2.ntw.008

SR-IOV Device Plugin for Network Intensive

When hosting workloads that match the Network Intensive profile and require SR-IOV ac celeration, a Device Plugin for SR-IOV must be used to configure the SR-IOV devices and advertise them to the ` kubelet` .

e.cap .013

4 .3.1

ra 2.ntw.009

Multiple connection points with m ultiplexer/ meta-plugin

When a m ultiplexer/ meta-plugin is used, the additional non-default connection points must be managed by a CNI -conformant Network Plugin.

req.ge n.ost.01

4 .3.1

ra 2.ntw.010

User plane networking

When hosting workloads matching the Network Intensive profile, CNI network plugins that support the use of DPDK, VPP, or SR-IOV must be deployed as part of the networking solution.

infra. net.acc.cfg .001

4 .3.1

ra 2.ntw.011

NATless c onnectivity

When hosting workloads that require source and destination IP addresses to be preserved in the traffic headers, a CNI plugin that exposes the pod IP directly to the external networks (e.g. Calico, MACVLAN or IPVLAN CNI plugins) is required.

req.in f.ntw.14

ra 2.ntw.012

Device Plugins

When hosting workloads matching the Network Intensive profile that require the use of FPGA, SR-IOV or other A cceleration Hardware, a Device Plugin for that FPGA or A cceleration Hardware must be used.

e.cap. 016 , e.cap .013

4 .3.1

ra 2.ntw.013

Dual stack CNI

The networking solution deployed within the imp lementation must use a CNI -conformant Network Plugin that is able to support dual-stack IPv4/IPv6 networking.

req.in f.ntw.04

ra 2.ntw.014

Security Groups

The networking solution deployed within the imp lementation must support network policies.

in fra.net.cfg .004

ra 2.ntw.015

IPAM plugin for multiplexer

When a m ultiplexer/ meta-plugin is used, a CNI -conformant IPAM Network Plugin must be installed to allocate IP addresses for secondary network interfaces across all nodes of the cluster.

req.in f.ntw.10

Table 4-4: Networking Solution Specifications

4.6 Storage components

In order for the storage solution(s) to be conformant with the Reference Architecture they must be implemented as per the following specifications:

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

ra 2.stg.001

Ephemeral Storage

An imp lementation must support ephemeral storage, for the unpacked container images to be stored and executed from, as a directory in the filesystem on the worker node on which the container is running. See the Container runtime s section above for more information on how this meets the requirement for ephemeral storage for containers.

ra 2.stg.002

Kubernetes Volumes

An imp lementation may attach additional storage to containers using Kubernetes Volumes.

ra 2.stg.003

Kubernetes Volumes

An imp lementation may use Volume Plugins (see ra 2.stg.005 below) to allow the use of a storage protocol (e.g. iSCSI, NFS) or management API (e.g. Cinder, EBS) for the attaching and mounting of storage into a Pod.

ra 2.stg.004

Persistent Volumes

An imp lementation may support Kubernetes Persistent Volumes (PV) to provide persistent storage for Pods .Persistent Volumes exist independent of the lifecycle of containers and/or pods.

req.in f.stg.01

ra 2.stg.005

Storage Extension

Volume plugins must allow for the use of a range of backend storage systems.

ra 2.stg.006

Container Storage Interface (CSI)

An imp lementation may support the Container Storage Interface (CSI), an Out-of-tree plugin.In order to support CSI, the feature gates CSIDrive rRegistry and CS INodeInfo must be enabled.The imp lementation must use a CSI driver (a full list of CSI drivers can be found here ). An imp lementation may support ephemeral storage through a CSI -compatible volume plugin in which case the CSIInl ineVolume feature gate must be enabled.An imp lementation may support Persistent Volumes through a CSI -compatible volume plugin in which case the ` CSIPersist entVolume` feature gate must be enabled.

ra 2.stg.007

An imp lementation should use Kubernetes Storage Classes to support automation and the separation of concerns between providers of a service and consumers of the service.

Table 4-6: Storage Solution Specifications

A note on object storage:

  • This Reference Architecture does not include any specifications for object storage, as this is neither a native Kubernetes object, nor something that is required by CSI drivers. Object storage is an application-level requirement that would ordinarily be provided by a highly scalable service offering rather than being something an individual Kubernetes Cluster could offer.

Todo: specifications/commentary to support req.inf.stg.04 (SDS) and req.inf.stg.05 (high performance and horizontally scalable storage). Also req.sec.gen.06 (storage resource isolation), req.sec.gen.10 (CIS - if applicable) and req.sec.zon.03 (data encryption at rest).

4.7 Service meshes

Application service meshes are not in scope for the architecture. Network service mesh specifications are handled in section 4.5 Networking solutions .

4.8 Kubernetes Application package manager

In order for the storage solution(s) to be conformant with the Reference Architecture they must be implemented as per the following specifications:

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

ra 2.pkg.001

API-based package management

A package manager must use the Kubernetes APIs to manage application artefacts. C luster-side components such as Tiller are not supported.

req.int. api.02

Table 4-7: Kubernetes Application Package Management Specifications

4.9 Kubernetes workloads

In order for the Kubernetes workloads to be conformant with the Reference Architecture they must be implemented as per the following specifications:

Ref

Sp ecification

Details

Requirement Trace

Reference Imp lementation Trace

ra 2.app.001

R oot Parameter Group (OCI Spec)

Specifies the container’s root filesystem.

TBD

N/A

ra 2.app.002

Mounts Parameter Group (OCI Spec)

Specifies additional mounts beyond root

TBD

N/A

ra 2.app.003

P rocess Parameter Group (OCI Spec)

Specifies the container process

TBD

N/A

ra 2.app.004

Hos tname Parameter Group (OCI Spec)

Specifies the container’s hostname as seen by processes running inside the container

TBD

N/A

ra 2.app.005

`User < https://git hub.com/ope ncontainers /runtime-sp ec/blob/mas ter/config. md#user>`__ Parameter Group (OCI Spec)

User for the process is a platfo rm-specific structure that allows specific control over which user the process runs as

TBD

N/A

ra 2.app.006

Consumption of additional, non-default connection points

The workload must request additional non-default connection points through the use of workload annotations or resource requests and limits within the container spec passed to the Kubernetes API Server.

req.in t.api.01

N/A

ra 2.app.007

Host Volumes

Workloads should not use ` hostPath` volumes, as Pods with identical co nfiguration (such as created from a P odTemplate) may behave differently on different nodes due to different files on the nodes.

req.kc m.gen.02

N/A

ra 2.app.008

Inf rastructure dependency

Workloads must not rely on the a vailability of the master nodes for the successful execution of their fu nctionality (i.e. loss of the master nodes may affect non -functional behaviours such as healing and scaling, but components that are already running will continue to do so without issue).

TBD

N/A

ra 2.app.009

Device plugins

Workload descriptors must use the resources advertised by the device plugins to indicate their need for an FPGA, SR-IOV or other a cceleration device.

TBD

N/A

ra 2.app.010

Node Feature Discovery (NFD)

Workload descriptors must use the labels advertised by Node Feature Di scovery to indicate which node software of hardware features they need.

TBD

N/A

Table 4-8: Kubernetes Workload Specifications

4.10 Additional required components

This chapter should list any additional components needed to provide the services defined in Chapter 3.2 (e.g: Prometheus)