Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution.
A typical k8s deployment scenario on OpenStack would be as below.
- Master (single VM)
- Minions/Nodes (AutoScalingGroup)
AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers).
- Scaling Nodes/Minions:
We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes. Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling.
Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG.
There is an existing blueprint for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client.
Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as
Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic, persist data. It would also require a cooldown period before the node removal.
Both requirement 2 and 3 would probably require generating scaling event notifications/signals for master and containers to consume and probably some ASG lifecycle hooks.
Req 4: In case of too many 'pending' pods to be scheduled, scheduler would signal ASG to scale up. This is similar to Req 1.
- Scaling Pods
Currently manual scaling of pods is possible by resizing ReplicationControllers. k8s community is working on an abstraction, AutoScaler on top of ReplicationController(RC) that provides intention/rule based autoscaling. There would be a requirement to collect cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is beyond the scope of OpenStack.
Any thoughts and ideas on how to realize this use-case would be appreciated.
OpenStack Development Mailing List (not for usage questions)