settingsLogin | Registersettings

[openstack-dev] [heat] Kubernetes AutoScaling with Heat AutoScalingGroup and Ceilometer

0 votes

Hi All,

Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution.

A typical k8s deployment scenario on OpenStack would be as below.

  • Master (single VM)
  • Minions/Nodes (AutoScalingGroup)

AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers).

  1. Scaling Nodes/Minions:

We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes. Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling.

Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG.

There is an existing blueprint[1] for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client.

Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as
unschedulable.

Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic, persist data. It would also require a cooldown period before the node removal.

Both requirement 2 and 3 would probably require generating scaling event notifications/signals for master and containers to consume and probably some ASG lifecycle hooks.

Req 4: In case of too many 'pending' pods to be scheduled, scheduler would signal ASG to scale up. This is similar to Req 1.

  1. Scaling Pods

Currently manual scaling of pods is possible by resizing ReplicationControllers. k8s community is working on an abstraction, AutoScaler[2] on top of ReplicationController(RC) that provides intention/rule based autoscaling. There would be a requirement to collect cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is beyond the scope of OpenStack.

Any thoughts and ideas on how to realize this use-case would be appreciated.

[1] https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b
[2] https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md

Regards,
Rabi Mishra


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Apr 27, 2015 in openstack-dev by Rabi_Mishra (2,140 points)   2 8

3 Responses

0 votes

On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote:
Hi All,

Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for container based workload is a standard deployment pattern. However, auto-scaling this cluster based on load would require some integration between k8s OpenStack components. While looking at the option of leveraging Heat ASG to achieve autoscaling, I came across few requirements that the list can discuss and arrive at the best possible solution.

A typical k8s deployment scenario on OpenStack would be as below.

  • Master (single VM)
  • Minions/Nodes (AutoScalingGroup)

AutoScaling of the cluster would involve both scaling of minions/nodes and scaling Pods(ReplicationControllers).

  1. Scaling Nodes/Minions:

We already have utilization stats collected at the hypervisor level, as ceilometer compute agent polls the local libvirt daemon to acquire performance data for the local instances/nodes.

I really doubts if those metrics are so useful to trigger a scaling
operation. My suspicion is based on two assumptions: 1) autoscaling
requests should come from the user application or service, not from the
controller plane, the application knows best whether scaling is needed;
2) hypervisor level metrics may be misleading in some cases. For
example, it cannot give an accurate CPU utilization number in the case
of CPU overcommit which is a common practice.

Also, Kubelet (running on the node) collects the cAdvisor stats. However, cAdvisor stats are not fed back to the scheduler at present and scheduler uses a simple round-robin method for scheduling.

It looks like a multi-layer resource management problem which needs a
wholistic design. I'm not quite sure if scheduling at the container
layer alone can help improve resource utilization or not.

Req 1: We would need a way to push stats from the kubelet/cAdvisor to ceilometer directly or via the master(using heapster). Alarms based on these stats can then be used to scale up/down the ASG.

To send a sample to ceilometer for triggering autoscaling, we will need
some user credentials to authenticate with keystone (even with trusts).
We need to pass the project-id in and out so that ceilometer will know
the correct scope for evaluation. We also need a standard way to tag
samples with the stack ID and maybe also the ASG ID. I'd love to see
this done transparently, i.e. no matching_metadata or query confusions.

There is an existing blueprint[1] for an inspector implementation for docker hypervisor(nova-docker). However, we would probably require an agent running on the nodes or master and send the cAdvisor or heapster stats to ceilometer. I've seen some discussions on possibility of leveraging keystone trusts with ceilometer client.

An agent is needed, definitely.

Req 2: Autoscaling Group is expected to notify the master that a new node has been added/removed. Before removing a node the master/scheduler has to mark node as
unschedulable.

A little bit confused here ... are we scaling the containers or the
nodes or both?

Req 3: Notify containers/pods that the node would be removed for them to stop accepting any traffic, persist data. It would also require a cooldown period before the node removal.

There have been some discussions on sending messages, but so far I don't
think there is a conclusion on the generic solution.

Just my $0.02.

BTW, we have been looking into similar problems in the Senlin project.

Regards,
Qiming

Both requirement 2 and 3 would probably require generating scaling event notifications/signals for master and containers to consume and probably some ASG lifecycle hooks.

Req 4: In case of too many 'pending' pods to be scheduled, scheduler would signal ASG to scale up. This is similar to Req 1.

  1. Scaling Pods

Currently manual scaling of pods is possible by resizing ReplicationControllers. k8s community is working on an abstraction, AutoScaler[2] on top of ReplicationController(RC) that provides intention/rule based autoscaling. There would be a requirement to collect cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is beyond the scope of OpenStack.

Any thoughts and ideas on how to realize this use-case would be appreciated.

[1] https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b
[2] https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md

Regards,
Rabi Mishra


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 28, 2015 by Qiming_Teng (7,380 points)   3 9 16
0 votes

----- Original Message -----
On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote:

Hi All,

Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for
container based workload is a standard deployment pattern. However,
auto-scaling this cluster based on load would require some integration
between k8s OpenStack components. While looking at the option of
leveraging Heat ASG to achieve autoscaling, I came across few requirements
that the list can discuss and arrive at the best possible solution.

A typical k8s deployment scenario on OpenStack would be as below.

  • Master (single VM)
  • Minions/Nodes (AutoScalingGroup)

AutoScaling of the cluster would involve both scaling of minions/nodes and
scaling Pods(ReplicationControllers).

  1. Scaling Nodes/Minions:

We already have utilization stats collected at the hypervisor level, as
ceilometer compute agent polls the local libvirt daemon to acquire
performance data for the local instances/nodes.

I really doubts if those metrics are so useful to trigger a scaling
operation. My suspicion is based on two assumptions: 1) autoscaling
requests should come from the user application or service, not from the
controller plane, the application knows best whether scaling is needed;
2) hypervisor level metrics may be misleading in some cases. For
example, it cannot give an accurate CPU utilization number in the case
of CPU overcommit which is a common practice.

I agree that correct utilization statistics is complex with virtual infrastructure.
However, I think physical+hypervisor metrics (collected by compute agent) should be a
good point to start.

Also, Kubelet (running on the node) collects the cAdvisor stats. However,
cAdvisor stats are not fed back to the scheduler at present and scheduler
uses a simple round-robin method for scheduling.

It looks like a multi-layer resource management problem which needs a
wholistic design. I'm not quite sure if scheduling at the container
layer alone can help improve resource utilization or not.

k8s scheduler is going to improve over time to use the cAdvisor/heapster metrics for
better scheduling. IMO, we should leave that for k8s to handle.

My point is on getting that metrics to ceilometer either from the nodes or from the \
scheduler/master.

Req 1: We would need a way to push stats from the kubelet/cAdvisor to
ceilometer directly or via the master(using heapster). Alarms based on
these stats can then be used to scale up/down the ASG.

To send a sample to ceilometer for triggering autoscaling, we will need
some user credentials to authenticate with keystone (even with trusts).
We need to pass the project-id in and out so that ceilometer will know
the correct scope for evaluation. We also need a standard way to tag
samples with the stack ID and maybe also the ASG ID. I'd love to see
this done transparently, i.e. no matching_metadata or query confusions.

There is an existing blueprint[1] for an inspector implementation for
docker hypervisor(nova-docker). However, we would probably require an
agent running on the nodes or master and send the cAdvisor or heapster
stats to ceilometer. I've seen some discussions on possibility of
leveraging keystone trusts with ceilometer client.

An agent is needed, definitely.

Req 2: Autoscaling Group is expected to notify the master that a new node
has been added/removed. Before removing a node the master/scheduler has to
mark node as
unschedulable.

A little bit confused here ... are we scaling the containers or the
nodes or both?

We would only focusing on the nodes. However, adding/removing nodes without the k8s master/scheduler
knowing about it (so that it can schedule pods or make them unschedulable)would be useless.

Req 3: Notify containers/pods that the node would be removed for them to
stop accepting any traffic, persist data. It would also require a cooldown
period before the node removal.

There have been some discussions on sending messages, but so far I don't
think there is a conclusion on the generic solution.

Just my $0.02.

Thanks Qiming.

BTW, we have been looking into similar problems in the Senlin project.

Great. We can probably discuss these during the Summit? I assume there is already a session
on Senlin planned, right?

Regards,
Qiming

Both requirement 2 and 3 would probably require generating scaling event
notifications/signals for master and containers to consume and probably
some ASG lifecycle hooks.

Req 4: In case of too many 'pending' pods to be scheduled, scheduler would
signal ASG to scale up. This is similar to Req 1.

  1. Scaling Pods

Currently manual scaling of pods is possible by resizing
ReplicationControllers. k8s community is working on an abstraction,
AutoScaler[2] on top of ReplicationController(RC) that provides
intention/rule based autoscaling. There would be a requirement to collect
cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is
beyond the scope of OpenStack.

Any thoughts and ideas on how to realize this use-case would be
appreciated.

[1]
https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b
[2]
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md

Regards,
Rabi Mishra


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 28, 2015 by Rabi_Mishra (2,140 points)   2 8
0 votes

You can take a look onto Murano Kubernetes package. There is no autoscaling
out of the box, but it will be quite trivial to add a new action for that
as there are functions to add new ETC and Kubernetes nodes on master as
well as there is a function to add a new VM.

Here is an example of a scaleUp action:
https://github.com/gokrokvertskhov/murano-app-incubator/blob/monitoring-ha/io.murano.apps.java.HelloWorldCluster/Classes/HelloWorldCluster.murano#L93

Here is Kubernetes scaleUp action:
https://github.com/openstack/murano-apps/blob/master/Docker/Kubernetes/KubernetesCluster/package/Classes/KubernetesCluster.yaml#L441

And here is a place where Kubernetes master is update with a new node info:
https://github.com/openstack/murano-apps/blob/master/Docker/Kubernetes/KubernetesCluster/package/Classes/KubernetesMinionNode.yaml#L90

By that way as you can see there is cAdvisor setup on a new node too.

Thanks
Gosha

On Tue, Apr 28, 2015 at 8:52 AM, Rabi Mishra ramishra@redhat.com wrote:

----- Original Message -----

On Mon, Apr 27, 2015 at 12:28:01PM -0400, Rabi Mishra wrote:

Hi All,

Deploying Kubernetes(k8s) cluster on any OpenStack based cloud for
container based workload is a standard deployment pattern. However,
auto-scaling this cluster based on load would require some integration
between k8s OpenStack components. While looking at the option of
leveraging Heat ASG to achieve autoscaling, I came across few
requirements
that the list can discuss and arrive at the best possible solution.

A typical k8s deployment scenario on OpenStack would be as below.

  • Master (single VM)
  • Minions/Nodes (AutoScalingGroup)

AutoScaling of the cluster would involve both scaling of minions/nodes
and
scaling Pods(ReplicationControllers).

  1. Scaling Nodes/Minions:

We already have utilization stats collected at the hypervisor level, as
ceilometer compute agent polls the local libvirt daemon to acquire
performance data for the local instances/nodes.

I really doubts if those metrics are so useful to trigger a scaling
operation. My suspicion is based on two assumptions: 1) autoscaling
requests should come from the user application or service, not from the
controller plane, the application knows best whether scaling is needed;
2) hypervisor level metrics may be misleading in some cases. For
example, it cannot give an accurate CPU utilization number in the case
of CPU overcommit which is a common practice.

I agree that correct utilization statistics is complex with virtual
infrastructure.
However, I think physical+hypervisor metrics (collected by compute agent)
should be a
good point to start.

Also, Kubelet (running on the node) collects the cAdvisor stats.
However,
cAdvisor stats are not fed back to the scheduler at present and
scheduler
uses a simple round-robin method for scheduling.

It looks like a multi-layer resource management problem which needs a
wholistic design. I'm not quite sure if scheduling at the container
layer alone can help improve resource utilization or not.

k8s scheduler is going to improve over time to use the cAdvisor/heapster
metrics for
better scheduling. IMO, we should leave that for k8s to handle.

My point is on getting that metrics to ceilometer either from the nodes or
from the \
scheduler/master.

Req 1: We would need a way to push stats from the kubelet/cAdvisor to
ceilometer directly or via the master(using heapster). Alarms based on
these stats can then be used to scale up/down the ASG.

To send a sample to ceilometer for triggering autoscaling, we will need
some user credentials to authenticate with keystone (even with trusts).
We need to pass the project-id in and out so that ceilometer will know
the correct scope for evaluation. We also need a standard way to tag
samples with the stack ID and maybe also the ASG ID. I'd love to see
this done transparently, i.e. no matching_metadata or query confusions.

There is an existing blueprint[1] for an inspector implementation for
docker hypervisor(nova-docker). However, we would probably require an
agent running on the nodes or master and send the cAdvisor or heapster
stats to ceilometer. I've seen some discussions on possibility of
leveraging keystone trusts with ceilometer client.

An agent is needed, definitely.

Req 2: Autoscaling Group is expected to notify the master that a new
node
has been added/removed. Before removing a node the master/scheduler
has to
mark node as
unschedulable.

A little bit confused here ... are we scaling the containers or the
nodes or both?

We would only focusing on the nodes. However, adding/removing nodes
without the k8s master/scheduler
knowing about it (so that it can schedule pods or make them
unschedulable)would be useless.

Req 3: Notify containers/pods that the node would be removed for them
to
stop accepting any traffic, persist data. It would also require a
cooldown
period before the node removal.

There have been some discussions on sending messages, but so far I don't
think there is a conclusion on the generic solution.

Just my $0.02.

Thanks Qiming.

BTW, we have been looking into similar problems in the Senlin project.

Great. We can probably discuss these during the Summit? I assume there is
already a session
on Senlin planned, right?

Regards,
Qiming

Both requirement 2 and 3 would probably require generating scaling
event
notifications/signals for master and containers to consume and probably
some ASG lifecycle hooks.

Req 4: In case of too many 'pending' pods to be scheduled, scheduler
would
signal ASG to scale up. This is similar to Req 1.

  1. Scaling Pods

Currently manual scaling of pods is possible by resizing
ReplicationControllers. k8s community is working on an abstraction,
AutoScaler[2] on top of ReplicationController(RC) that provides
intention/rule based autoscaling. There would be a requirement to
collect
cAdvisor/Heapster stats to signal the AutoScaler too. Probably this is
beyond the scope of OpenStack.

Any thoughts and ideas on how to realize this use-case would be
appreciated.

[1]

https://review.openstack.org/gitweb?p=openstack%2Fceilometer-specs.git;a=commitdiff;h=6ea7026b754563e18014a32e16ad954c86bd8d6b

[2]

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md

Regards,
Rabi Mishra


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Georgy Okrokvertskhov
Architect,
OpenStack Platform Products,
Mirantis
http://www.mirantis.com
Tel. +1 650 963 9828
Mob. +1 650 996 3284


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 29, 2015 by Georgy_Okrokvertskho (3,820 points)   2 5
...