settingsLogin | Registersettings

[openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

0 votes

Hi, we have been struggling with getting a meaningful set of metrics from LB stats thru ceilometer, and from a discussion about module responsibilities for providing data, an interesting idea came up. (Thanks Pradeep!)
The proposal is to consolidate some kinds of metrics as pool up time (hours) and average or historic response times of VIPs and listeners, to avoid having ceilometer querying for the state so frequently. There is a trade-off between fast response time (high sampling rate) and reasonable* amount of cumulative samples.
The next step in order to give more detail to the idea is to work on a use cases list to better explain / understand the benefits of this kind of data grouping.

What dou you think about this?
Do you find it will be useful to have some processed metrics on the loadbalancer side instead of the ceilometer side?
Do you identify any measurements about the load balancer that could not be obtained/calculated from ceilometer?
Perhaps this could be the base for other stats gathering solutions that may be under discussion?

Andres
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.openstack.org/pipermail/openstack-dev/attachments/20140610/3bb494c4/attachment.html

asked Jun 10, 2014 in openstack-dev by Buraschi,_Andres (280 points)   1
retagged Jan 28, 2015 by admin

3 Responses

0 votes

Hey Andres,

In my experience with usage gathering consolidating statistics at the root layer is usually a bad idea. The reason is that you lose potentially useful information once you consolidate data. When it comes to troubleshooting issues (such as billing) this lost information can cause problems since there is no way to "replay" what had actually happened. That said, there is no free lunch and keeping track of huge amounts of data can be a huge engineering challenge. We have a separate thread on what kinds of metrics we want to expose from the LBaaS service so perhaps it would be nice to understand these in more detail.

Cheers,
--Jorge

From: , Andres <andres.buraschi at intel.com<mailto:andres.buraschi at intel.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" >
Date: Tuesday, June 10, 2014 3:34 PM
To: "OpenStack Development Mailing List (not for usage questions)" >
Subject: [openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

Hi, we have been struggling with getting a meaningful set of metrics from LB stats thru ceilometer, and from a discussion about module responsibilities for providing data, an interesting idea came up. (Thanks Pradeep!)
The proposal is to consolidate some kinds of metrics as pool up time (hours) and average or historic response times of VIPs and listeners, to avoid having ceilometer querying for the state so frequently. There is a trade-off between fast response time (high sampling rate) and reasonable* amount of cumulative samples.
The next step in order to give more detail to the idea is to work on a use cases list to better explain / understand the benefits of this kind of data grouping.

What dou you think about this?
Do you find it will be useful to have some processed metrics on the loadbalancer side instead of the ceilometer side?
Do you identify any measurements about the load balancer that could not be obtained/calculated from ceilometer?
Perhaps this could be the base for other stats gathering solutions that may be under discussion?

Andres
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Jun 12, 2014 by Jorge_Miramontes (2,280 points)   1 6
0 votes

Hi Jorge, thanks for your reply! You are right about summarizing too much. The idea is to identify which kinds of data could be retrieved in a summarized way without losing detail (i.e.: uptime can be better described with start-end timestamps than with lots of samples with up/down status) or simply to provide different levels of granularity and let the user decide (yes, it can be sometimes dangerous).
Having said this, how could we share the current metrics intended to be exposed? Is there a document or should I follow the "Requirements around statistics and billing" thread?

Thank you!
Andres

From: Jorge Miramontes [mailto:jorge.miramontes at RACKSPACE.COM]
Sent: Thursday, June 12, 2014 6:35 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

Hey Andres,

In my experience with usage gathering consolidating statistics at the root layer is usually a bad idea. The reason is that you lose potentially useful information once you consolidate data. When it comes to troubleshooting issues (such as billing) this lost information can cause problems since there is no way to "replay" what had actually happened. That said, there is no free lunch and keeping track of huge amounts of data can be a huge engineering challenge. We have a separate thread on what kinds of metrics we want to expose from the LBaaS service so perhaps it would be nice to understand these in more detail.

Cheers,
--Jorge

From: , Andres <andres.buraschi at intel.com<mailto:andres.buraschi at intel.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" >
Date: Tuesday, June 10, 2014 3:34 PM
To: "OpenStack Development Mailing List (not for usage questions)" >
Subject: [openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

Hi, we have been struggling with getting a meaningful set of metrics from LB stats thru ceilometer, and from a discussion about module responsibilities for providing data, an interesting idea came up. (Thanks Pradeep!)
The proposal is to consolidate some kinds of metrics as pool up time (hours) and average or historic response times of VIPs and listeners, to avoid having ceilometer querying for the state so frequently. There is a trade-off between fast response time (high sampling rate) and reasonable* amount of cumulative samples.
The next step in order to give more detail to the idea is to work on a use cases list to better explain / understand the benefits of this kind of data grouping.

What dou you think about this?
Do you find it will be useful to have some processed metrics on the loadbalancer side instead of the ceilometer side?
Do you identify any measurements about the load balancer that could not be obtained/calculated from ceilometer?
Perhaps this could be the base for other stats gathering solutions that may be under discussion?

Andres
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Jun 16, 2014 by Buraschi,_Andres (280 points)   1
0 votes

Hey Andres,

Sorry for the late reply. I was out of town all last week. I would suggest continuing the email thread before we put this on a wiki somewhere so others can chime in.

Cheers,
--Jorge

From: , Andres <andres.buraschi at intel.com<mailto:andres.buraschi at intel.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" >
Date: Monday, June 16, 2014 10:06 AM
To: "OpenStack Development Mailing List (not for usage questions)" >
Subject: Re: [openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

Hi Jorge, thanks for your reply! You are right about summarizing too much. The idea is to identify which kinds of data could be retrieved in a summarized way without losing detail (i.e.: uptime can be better described with start-end timestamps than with lots of samples with up/down status) or simply to provide different levels of granularity and let the user decide (yes, it can be sometimes dangerous).
Having said this, how could we share the current metrics intended to be exposed? Is there a document or should I follow the ?Requirements around statistics and billing? thread?

Thank you!
Andres

From: Jorge Miramontes [mailto:jorge.miramontes at RACKSPACE.COM]
Sent: Thursday, June 12, 2014 6:35 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

Hey Andres,

In my experience with usage gathering consolidating statistics at the root layer is usually a bad idea. The reason is that you lose potentially useful information once you consolidate data. When it comes to troubleshooting issues (such as billing) this lost information can cause problems since there is no way to "replay" what had actually happened. That said, there is no free lunch and keeping track of huge amounts of data can be a huge engineering challenge. We have a separate thread on what kinds of metrics we want to expose from the LBaaS service so perhaps it would be nice to understand these in more detail.

Cheers,
--Jorge

From: , Andres <andres.buraschi at intel.com<mailto:andres.buraschi at intel.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" >
Date: Tuesday, June 10, 2014 3:34 PM
To: "OpenStack Development Mailing List (not for usage questions)" >
Subject: [openstack-dev] [Neutron][LBaaS] Consolidated metrics proposal

Hi, we have been struggling with getting a meaningful set of metrics from LB stats thru ceilometer, and from a discussion about module responsibilities for providing data, an interesting idea came up. (Thanks Pradeep!)
The proposal is to consolidate some kinds of metrics as pool up time (hours) and average or historic response times of VIPs and listeners, to avoid having ceilometer querying for the state so frequently. There is a trade-off between fast response time (high sampling rate) and reasonable* amount of cumulative samples.
The next step in order to give more detail to the idea is to work on a use cases list to better explain / understand the benefits of this kind of data grouping.

What dou you think about this?
Do you find it will be useful to have some processed metrics on the loadbalancer side instead of the ceilometer side?
Do you identify any measurements about the load balancer that could not be obtained/calculated from ceilometer?
Perhaps this could be the base for other stats gathering solutions that may be under discussion?

Andres
-------------- next part --------------
An HTML attachment was scrubbed...
URL:

responded Jun 25, 2014 by Jorge_Miramontes (2,280 points)   1 6
...