Kubernetes 调度器(Scheduler)是Kubernetes的核心组件;用户或者控制器创建Pod之后,调度器通过 kubernetes 的 watch 机制来发现集群中新创建且尚未被调度到 Node 上的 Pod。调度器会将发现的每一个未调度的 Pod 调度到一个合适的 Node 上来运行。调度器会依据下文的调度原则来做出调度选择。
kube-scheduler 给一个 pod 做调度选择包含两个步骤:过滤、打分
过滤阶段:会将所有满足 Pod 调度需求的 Node 选出来。例如,PodFitsResources 过滤函数会检查候选 Node 的可用资源能否满足 Pod 的资源请求。在过滤之后,得出一个 Node 列表,里面包含了所有可调度节点;通常情况下,这个 Node 列表包含不止一个 Node。如果这个列表是空的,代表这个 Pod 不可调度。
我们可以使用多种规则比如:
注:k8s 1.2加入了一个实验性的功能:affinity。意为亲和性。这个特性的设计初衷是为了替代nodeSelector,并扩展更强大的调度策略。
首先用户通过 Kubernetes 客户端 Kubectl 提交创建 Pod 的 Yaml 的文件,向Kubernetes 系统发起资源请求,该资源请求被提交到
Kubernetes 系统中,用户通过命令行工具 Kubectl 向 Kubernetes 集群即 APIServer 用 的方式发送“POST”请求,即创建 Pod 的请求。
APIServer 接收到请求后把创建 Pod 的信息存储到 Etcd 中,从集群运行那一刻起,资源调度系统 Scheduler 就会定时去监控 APIServer
通过 APIServer 得到创建 Pod 的信息,Scheduler 采用 watch 机制,一旦 Etcd 存储 Pod 信息成功便会立即通知APIServer,
APIServer会立即把Pod创建的消息通知Scheduler,Scheduler发现 Pod 的属性中 Dest Node 为空时(Dest Node=””)便会立即触发调度流程进行调度。
而这一个创建Pod对象,在调度的过程当中有3个阶段:节点预选、节点优选、节点选定,从而筛选出最佳的节点
节点预选:基于一系列的预选规则对每个节点进行检查,将那些不符合条件的节点过滤,从而完成节点的预选
节点优选:对预选出的节点进行优先级排序,以便选出最合适运行Pod对象的节点
节点选定:从优先级排序结果中挑选出优先级最高的节点运行Pod,当这类节点多于1个时,则进行随机选择
Kubernetes中,Pod通常是容器的载体,一般需要通过Deployment、DaemonSet、RC、Job等对象来完成一组Pod的调度与自动控制功能。
[root@uk8s-m-01 study]# vi nginx-deployment.yaml apiVersion: apps/v1beta1 kind: Deployment metadata: name: nginx-deployment-01 spec: replicas: 3 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80 [root@uk8s-m-01 study]# kubectl get deployments NAME READY UP-TO-DATE AVAILABLE AGE nginx-deployment-01 3/3 3 3 30s [root@uk8s-m-01 study]# kubectl get rs NAME DESIRED CURRENT READY AGE nginx-deployment-01-5754944d6c 3 3 3 75s [root@uk8s-m-01 study]# kubectl get pod | grep nginx nginx-deployment-01-5754944d6c-hmcpg 1/1 Running 0 84s nginx-deployment-01-5754944d6c-mcj8q 1/1 Running 0 84s nginx-deployment-01-5754944d6c-p42mh 1/1 Running 0 84s
当需要手动指定将Pod调度到特定Node上,可以通过Node的标签(Label)和Pod的nodeSelector属性相匹配。
[root@uk8s-m-01 study]# kubectl label nodes 172.24.9.14 speed=io node/172.24.9.14 labeled [root@uk8s-m-01 study]# vi nginx-master-controller.yaml kind: ReplicationController metadata: name: nginx-master labels: name: nginx-master spec: replicas: 1 selector: name: nginx-master template: metadata: labels: name: nginx-master spec: containers: - name: master image: nginx:1.7.9 ports: - containerPort: 80 nodeSelector: speed: io [root@uk8s-m-01 study]# kubectl create -f nginx-master-controller.yaml [root@uk8s-m-01 study]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE nginx-master-7fjgj 1/1 Running 0 82s 172.24.9.71 172.24.9.14
[root@uk8s-m-01 study]# vi nodeaffinity-pod.yaml apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disk-type operator: In values: - ssd containers: - name: with-node-affinity image: gcr.azk8s.cn/google_containers/pause:2.0
[root@uk8s-m-01 study]# vi nginx-flag.yaml #创建名为pod-flag,带有两个标签的Pod apiVersion: v1 kind: Pod metadata: name: pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: kubernetes.io/hostname containers: - name: with-pod-affinity image: gcr.azk8s.cn/google_containers/pause:2.0
[root@uk8s-m-01 study]# vi nginx-affinity-in.yaml #创建定义标签security=S1,对应如上Pod “Pod-flag”。 apiVersion: v1 kind: Pod metadata: name: pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: kubernetes.io/hostname containers: - name: with-pod-affinity image: gcr.azk8s.cn/google_containers/pause:2.0 [root@uk8s-m-01 study]# kubectl create -f nginx-affinity-in.yaml [root@uk8s-m-01 study]# kubectl get pods -o wide
提示:由上Pod亲和力可知,两个Pod处于同一个Node上。
[root@uk8s-m-01 study]# vi nginx-affinity-out.yaml #创建不能与参照目标Pod运行在同一个Node上的调度策略 apiVersion: v1 kind: Pod metadata: name: anti-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: failure-domain.beta.kubernetes.io/zone podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - nginx topologyKey: kubernetes.io/hostname containers: - name: anti-affinity image: gcr.azk8s.cn/google_containers/pause:2.0 [root@uk8s-m-01 study]# kubectl get pods -o wide #验证
1 tolerations: 2 - key: "key" 3 operator: "Equal" 4 value: "value" 5 effect: "NoSchedule"
或
1 tolerations: 2 - key: "key" 3 operator: "Exists" 4 effect: "NoSchedule"
$ kubectl taint node node1 key=value1:NoSchedule $ kubectl taint node node1 key=value1:NoExecute $ kubectl taint node node1 key=value2:NoSchedule tolerations: - key: "key1" operator: "Equal" value: "value" effect: "NoSchedule" tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoExecute"
tolerations: - key: "key1" operator: "Equal" value: "value" effect: "NoSchedule" tolerationSeconds: 3600
释义:若Pod正在运行,所在节点被加入一个匹配的taint,则这个pod会持续在该节点运行3600s后被驱逐。若在此期限内,taint被移除,则不会触发驱逐事件。
1 $ kubectl taint nodes 【nodename】 special=true:NoSchedule 2 $ kubectl taint nodes 【nodename】 special=true:PreferNoSchedule
原文:https://www.cnblogs.com/wuxinchun/p/15219850.html