EKSでHorizontal Pod Autoscalerを試したメモ

Horizontal Pod Autoscaler とは、RepicaSetやDeployment等で起動したPodの数を、CPU使用率ベースにてオートスケールしてくれる機能です。

Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with beta support, on some other, application-provided metrics).

資料を眺めていると、CPU使用率以外もメトリクスとして利用できるようですが、CPU使用率を利用するのが一番基本的なものっぽいです。下記のAWS公式ドキュメントと、KubernetesのWalkthroughの資料を参考に手順を確認します。

aws - Horizontal Pod Autoscaler

k8s - Horizontal Pod Autoscaler Walkthrough

環境

kubernetes(EKS) v1.14.9

事前の準備

Horizontal Pod Autoscaler を利用するに必要な設定を実施します。

metrics-serverの起動

metrics-server とは、Kubernetes Cluster内のその名の通りメトリックスを取得してくれるpodになります。

Metrics server is responsible for collecting resource metrics from kubelets and exposing them in Kubernetes Apiserver through Metrics API. Main consumers of those metrics are kubectl top, HPA and VPA. Metric server stores only the latest values of metrics needed for core metrics pipeline (CPU, Memory) and is not responsible for forwarding metrics to third-party destinations.

Kubernetes Metrics Server

metrics-serverのpodを起動します。下記のGithubリリースページから、利用したいバージョンのモジュールをダウンロードしてきます。

https://github.com/kubernetes-sigs/metrics-server/releases/

必要となるマニフェストファイルが格納されているので、それをapplyします。

$ wget https://github.com/kubernetes-sigs/metrics-server/archive/v0.3.6.tar.gz
$ tar -xzf v0.3.6.tar.gz
$ kubectl apply -f metrics-server-0.3.6/deploy/1.8+/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

起動したことを確認します。

$ kubectl get deployment metrics-server -n kube-system
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           65s

topコマンドを利用できるようなっています。

$ kubectl top node
NAME                                           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-10-0-1-15.ap-northeast-1.compute.internal   40m          2%     374Mi           27%
$
$ kubectl top pod --all-namespaces
NAMESPACE     NAME                              CPU(cores)   MEMORY(bytes)
kube-system   aws-node-l4ncf                    2m           23Mi
kube-system   coredns-58986cd576-88pwn          2m           7Mi
kube-system   coredns-58986cd576-wscq2          2m           7Mi
kube-system   kube-proxy-sbznn                  1m           9Mi
kube-system   metrics-server-7fcf9cc98b-2v8sk   1m           11Mi

オートスケール対象のコンテナを起動

オートスケール対象のnginxをつくります。マニフェストファイルを作成し、nginxのpodを起動します。

nginx.yml

apiVersion: v1
kind: Namespace
metadata:
  name: sample
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-nginx
  namespace: sample
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: container-nginx
          image: nginx:latest
          ports:
            - containerPort: 80
          resources:
            limits:
              cpu: 200m
              memory: 256Mi
            requests:
              cpu: 100m
              memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
  name: service-nginx
  namespace: sample
spec:
  type: ClusterIP
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx

limits と requests のパラメーターで、podにて利用できるリソース量を制限しています。limits とは、そのpodで利用されるリソースの上限値を規定した値、 requests とは、そのpodで確保されるべきリソース値を指定した値となります。1vCPU(1000m)のWorker Nodeがあった場合、そのWorker Node上の全podの合計limits値が、1000m以上になることは許されますが、合計requests値が1000m以上になることは許されません。リソースを確保できないとk8sが判断して、podは起動されない事になります。

applyします。

$ kubectl apply -f nginx.yml
namespace/sample created
deployment.apps/deployment-nginx created
service/service-nginx created

現在のリソース状況を確認します。

$ kubectl top pod -n sample
NAME                                CPU(cores)   MEMORY(bytes)
deployment-nginx-74dd755c88-4f6sg   0m           2Mi

Horizontal Pod Autoscalerの設定

Horizontal Pod Autoscalerのcontrollerを作成します。まずマニフェストファイルを用意します。

autoscaler.yml

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: controller-autoscaler-nginx
  namespace: sample
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deployment-nginx
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 50

対象となるdeploymentを指定して、しきい値と最大/最小replica数を指定する感じですね。注意点が、この targetCPUUtilizationPercentage とは、podの containers[].resources.requests の値に対するパーセンテージということです。 limits じゃなくて、 requests です。また、 targetCPUUtilizationPercentage は、deployment内の全podの平均のCPU使用率にて判断します。

applyします。

$ kubectl apply -f autoscaler.yml
horizontalpodautoscaler.autoscaling/controller-autoscaler-nginx created

現在のオートスケールの状態を確認。

$ kubectl get HorizontalPodAutoscaler -n sample
NAME                          REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
controller-autoscaler-nginx   Deployment/deployment-nginx   0%/50%    1         3         1          4m40s

検証

apche-benchを動かすpodを起動し、nginxのクラスタ内DNSに向けて負荷をかけてみます。

$ kubectl run apache-bench -i --tty --rm --image=httpd -- /bin/sh
while true;
do ab -n 10 -c 10 http://service-nginx.sample.svc.cluster.local/ >/dev/null;
done

もう１つターミナルを起動して、オートスケールの状況を確認してみると、replica数が3つになっています。

$ kubectl get HorizontalPodAutoscaler -n sample
NAME                          REFERENCE                     TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
controller-autoscaler-nginx   Deployment/deployment-nginx   137%/50%   1         3         3          19m

3つになってますね。

$ kubectl top pod -n sample
NAME                                CPU(cores)   MEMORY(bytes)
deployment-nginx-74dd755c88-4f6sg   146m         2Mi
deployment-nginx-74dd755c88-5vc58   125m         2Mi
deployment-nginx-74dd755c88-x7p6j   133m         2Mi

詳細の表示。

$ kubectl describe HorizontalPodAutoscaler controller-autoscaler-nginx -n sample
Name:                                                  controller-autoscaler-nginx
Namespace:                                             sample
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"controller-autoscaler-nginx","namespa...
CreationTimestamp:                                     Sat, 29 Feb 2020 16:22:49 +0900
Reference:                                             Deployment/deployment-nginx
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  137% (137m) / 50%
Min replicas:                                          1
Max replicas:                                          3
Deployment pods:                                       3 current / 3 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  4m    horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target

current metricsの値が137%となっていますが、これは as a percentage of request と記載されている通り、 podの containers[].resources.requests に対して、137%ということを示しています。今回検証したpodでは、 requestsを100m、limitsを200mとしたので、pod自体は200mまでCPUを利用できるので、100%を超過した値になっている訳です。

「Kubernetes完全ガイド」を読んだ時に、podのrequestsとlimitsは同じ値にしたほうが良いよと記述ありましが、確かにこの値を分ける意味はない気がしますね。

Kubernetes完全ガイド (impress top gear)

作者:青山真也
発売日: 2018/09/21
メディア: 単行本（ソフトカバー）

Horizontal Pod Autoscalerのアルゴリズム

オートスケール判断時の詳細動作が記述されています。

Algorithm Details

以下が必要replica数を求める式です。イメージ通りですが、desiredReplica数はceilするよって事ですね。

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

オートスケール時に利用されるパラメータは、kube-controller-managerのパラメータにて定められているそうです。しかしながら、kube-controller-managerのpodはMaster Nodeで起動されるものなので、Master Nodeがユーザーに隠蔽されたEKSでは、操作不能なパラメータになっています。

ドキュメントで紹介されていたパラーメーターは下記でした。

parameter	default	説明
horizontal-pod-autoscaler-sync-period	15秒	オートスケーラーがリソース値を確認する間隔
horizontal-pod-autoscaler-initial-readiness-delay	30秒	?
horizontal-pod-autoscaler-cpu-initialization-period	5分	?
horizontal-pod-autoscaler-downscale-stabilization	5分	ダウンスケール処理された後の待機時間

horizontal-pod-autoscaler-initial-readiness-delay と horizontal-pod-autoscaler-cpu-initialization-period に関して、公式の英文が全く意味が分からなくて困りました。僕の英語力のせいだろうと思ったのですが、文章がよく分からんというissueを上げている人がいたので、英語力のある人でも分かりにくい文章なのだろうと。。。

[Doc/HPA] Unclear definition of the --horizontal-pod-autoscaler-initial-readiness-delay flag

horizontal-pod-autoscaler-initial-readiness-delay は、新規pod起動後のオートスケール待機時間かなと思ったのですが、確かではなし。 horizontal-pod-autoscaler-cpu-initialization-period については、まるで文章を理解できず。