settingsLogin | Registersettings

[openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

0 votes

Hello,

I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications. This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting,
we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including
those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the applications themselves.

OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked May 12, 2015 in openstack-dev by Vinod_Pandarinathan_ (420 points)   1 2

30 Responses

0 votes

For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Jay_Pipes (59,760 points)   3 11 15
0 votes

Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If "Cloud Apps" downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :)

Thanks,
Kevin


From: Jay Pipes [jaypipes@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Fox,_Kevin_M (29,360 points)   1 3 4
0 votes

Very True. However the way I see these are extensions/plugins to
cloudpulse framework, so when these are available, the data from these
tools are exposed.

Openstack health service provides an overall framework with out
assumptions on what is installed on the underlying cloud.
The service is expected to run on existing cloud deployments that may or
may not have any of this software (from tenant as well).

Core health checks for operators and tenants test basic openstack services
which are present in any openstack cloud.

Thanks for the feedback.

Thanks
Vinod.

On 5/12/15, 10:48 AM, "Jay Pipes" jaypipes@gmail.com wrote:

For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Vinod_Pandarinathan_ (420 points)   1 2
0 votes

On 05/12/15 20:48, Jay Pipes wrote:
For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Because that is what we love to do here in the OpenStack Community??
(Sorry I could not resist... :) )

But seriously though - do we have a set of tools that can do this - in a
simple - consolidated way?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter
--
Best Regards,
Maish Saidel-Keesing


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by maishsk_at_maishsk.c (3,000 points)   1 4 7
0 votes

Kevin,

This is a great idea that would make a solid extension to the software.
If I read the wiki page correctly, the real goal is for operators and
tenants to be able to be notified via querying the ReST API so they could
write their own email/pager-duty app.

Regards
-steve

On 5/12/15, 11:16 AM, "Fox, Kevin M" Kevin.Fox@pnnl.gov wrote:

Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If "Cloud Apps"
downloaded from an OpenStack Catalog had a Monitoring Heat resource built
in, that would register the launched app with a multitenant aware Cloud
Monitoring Service, the user would only have to launch an app, and then
go into the Dashboard and associate some kind of alerting policy with the
registered checks. Say, email this address when things break. That would
be awesome. :)

Thanks,
Kevin


From: Jay Pipes [jaypipes@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
HealthCheck OpenStack deployments

For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Steven_Dake_(stdake) (24,540 points)   2 13 29
0 votes

On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
Nagios/watever As A Service would actually be very useful I think.

I don't really understand why Nagios-as-a-Service would be useful to
operators. I mean, operators install their monitoring system of choice
via their configuration management tool of choice -- Ansible, SaltStack,
Puppet, Chef, etc.

Frankly, so do tenants. Tenants install software on their images using
configuration management tools like mentioned above... I don't see a
reason to have Nagios-as-a-Service for tenants either.

Setting up a monitoring server is a fair amount of work.

Not really. It's typically a simple apt-get install nagios-nrpe-plugins
on client VMs along with an apt-get install nagios-server on one or more
monitoring system VMs. Again, have configuration management systems
inject whatever check scripts you want paired with the ones that already
come with nagios-nrpe-plugins package.

If "Cloud
Apps" downloaded from an OpenStack Catalog had a Monitoring Heat
resource built in, that would register the launched app with a
multitenant aware Cloud Monitoring Service, the user would only have
to launch an app, and then go into the Dashboard and associate some
kind of alerting policy with the registered checks. Say, email this
address when things break. That would be awesome. :)

I guess I just don't see this being in the realm of OpenStack. Or at
least, not more than something like a Murano application manifest which
is almost what you are describing above.

I don't see the need for this service, sorry. Not everything needs to be
re-invented as a RESTful Python service endpoint...

Best,
-jay

Thanks, Kevin ________________________________________

From: Jay Pipes [jaypipes@gmail.com] Sent: Tuesday, May 12, 2015 10:48
AM To:
openstack-dev@lists.openstack.org Subject: Re: [openstack-dev]
[new][cloudpulse] Announcing a project to HealthCheck OpenStack
deployments

For operators:

  • Nagios * Icinga * Zabbix

installed on baremetal machines deployed with the OpenStack and
other infrastructure services.

For tenants:

  • Nagios * Icinga * Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best, -jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack health-checking services
to both operators, tenants, and applications. This project will
begin as a StackForge project based upon an empty cookiecutter[1]
repo. The repos to work in are: Server:
https://github.com/stackforge/cloudpulse Client:
https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting
the week after summit. This doodle poll will close May 24th and
meeting times will be announced on the mailing list at that time.
At our first IRC meeting, we will draft additional core team
members, so if your interested in joining a fresh new development
effort, please attend our first meeting. Please take a moment if
your interested in CloudPulse to fill out the doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of Ajay Kalambur, Behzad Dastur,
Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan. I expect more members to join during our initial
meeting.

A little bit about CloudPulse: Cloud operators need notification of
OpenStack failures before a customer reports the failure. Cloud
operators can then take timely corrective actions with minimal
disruption to applications. Many cloud applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual costs
associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality is that
occascionally OpenStack clouds fail in some mysterious ways. This
project intends to identify when those failures occur so corrective
actions may be taken by operators, tenants, and the applications
themselves.

OpenStack is considered healthy when OpenStack API services
respond appropriately. Further OpenStack is healthy when network
traffic can be sent between the tenant networks and can access the
Internet. Finally OpenStack is healthy when all infrastructure
cluster elements are in an operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking the health of
OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards, Vinod Pandarinathan [1]
https://github.com/openstack-dev/cookiecutter


OpenStack Development Mailing List (not for usage questions)

Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)

Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)

Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Jay_Pipes (59,760 points)   3 11 15
0 votes

Hooking it into Zaqar would be awesome too. Once you can trigger Mistral workflows based on Zaqar messages, just imagine the possibilities...

Kevin


From: Steven Dake (stdake)
Sent: Tuesday, May 12, 2015 12:02:59 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

Kevin,

This is a great idea that would make a solid extension to the software.
If I read the wiki page correctly, the real goal is for operators and
tenants to be able to be notified via querying the ReST API so they could
write their own email/pager-duty app.

Regards
-steve

On 5/12/15, 11:16 AM, "Fox, Kevin M" Kevin.Fox@pnnl.gov wrote:

Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If "Cloud Apps"
downloaded from an OpenStack Catalog had a Monitoring Heat resource built
in, that would register the launched app with a multitenant aware Cloud
Monitoring Service, the user would only have to launch an app, and then
go into the Dashboard and associate some kind of alerting policy with the
registered checks. Say, email this address when things break. That would
be awesome. :)

Thanks,
Kevin


From: Jay Pipes [jaypipes@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
HealthCheck OpenStack deployments

For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Fox,_Kevin_M (29,360 points)   1 3 4
0 votes

On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote:
Very True. However the way I see these are extensions/plugins to
cloudpulse framework, so when these are available, the data from these
tools are exposed.

Openstack health service provides an overall framework with out
assumptions on what is installed on the underlying cloud.
The service is expected to run on existing cloud deployments that may or
may not have any of this software (from tenant as well).

You mean, like Monasca?

https://wiki.openstack.org/wiki/Monasca

Sounds to me like you will at the very least need an agent of some sort
on the VMs to communicate to an external system. And, that is the
monasca-agent:

https://github.com/stackforge/monasca-agent

ala Nagios NRPE agent:

http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

ala Zabbix agent:

https://www.zabbix.com/documentation/2.0/manual/concepts/agent

ala Icinga agent:

http://docs.icinga.org/latest/en/nrpe.html

So, cloudpulse would be yet another agent for sending healthcheck
messages to an external system, in order for the framework "not to make
any assumptions on what is insyalled in the underlying cloud" -- other
than the assumption you'd need yet another agent installed.

Core health checks for operators and tenants test basic openstack services
which are present in any openstack cloud.

Operators != tenants. Trying to make the two equal each other and you
end up with Ceilometer and Triple-O -- with all the accompanying
complexity therein.

Best,
-jay

Thanks for the feedback.

Thanks
Vinod.

On 5/12/15, 10:48 AM, "Jay Pipes" jaypipes@gmail.com wrote:

For operators:

  • Nagios
  • Icinga
  • Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

  • Nagios
  • Icinga
  • Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called

CloudPulse. CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit. This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time. At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual
costs associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately. Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet. Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Jay_Pipes (59,760 points)   3 11 15
0 votes

On Tue, May 12 2015, Steven Dake (stdake) wrote:

This is a great idea that would make a solid extension to the software.
If I read the wiki page correctly, the real goal is for operators and
tenants to be able to be notified via querying the ReST API so they could
write their own email/pager-duty app.

Then leveraging Ceilometer polling and alarming systems could make you
avoid reinventing a large portion of the wheel.

--
Julien Danjou
// Free Software hacker
// http://julien.danjou.info


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded May 12, 2015 by Julien_Danjou (20,500 points)   2 5 7
0 votes

It totally depends on how much experience you think a tenant user has...

If we're talking about devops, they tend to have the skills to stand up a configuration management server, a monitoring server, and manage everything via config management.

If tenant users are research scientists, like some of ours, its a fair amount of work to manage nagios without config management, and config management is way more effort then most researchers want to put into learning. That's where an app catalog becomes important, and something like monitoring as a service starts to become interesting....

Thanks,
Kevin


From: Jay Pipes [jaypipes@gmail.com]
Sent: Tuesday, May 12, 2015 12:50 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
Nagios/watever As A Service would actually be very useful I think.

I don't really understand why Nagios-as-a-Service would be useful to
operators. I mean, operators install their monitoring system of choice
via their configuration management tool of choice -- Ansible, SaltStack,
Puppet, Chef, etc.

Frankly, so do tenants. Tenants install software on their images using
configuration management tools like mentioned above... I don't see a
reason to have Nagios-as-a-Service for tenants either.

Setting up a monitoring server is a fair amount of work.

Not really. It's typically a simple apt-get install nagios-nrpe-plugins
on client VMs along with an apt-get install nagios-server on one or more
monitoring system VMs. Again, have configuration management systems
inject whatever check scripts you want paired with the ones that already
come with nagios-nrpe-plugins package.

If "Cloud
Apps" downloaded from an OpenStack Catalog had a Monitoring Heat
resource built in, that would register the launched app with a
multitenant aware Cloud Monitoring Service, the user would only have
to launch an app, and then go into the Dashboard and associate some
kind of alerting policy with the registered checks. Say, email this
address when things break. That would be awesome. :)

I guess I just don't see this being in the realm of OpenStack. Or at
least, not more than something like a Murano application manifest which
is almost what you are describing above.

I don't see the need for this service, sorry. Not everything needs to be
re-invented as a RESTful Python service endpoint...

Best,
-jay

Thanks, Kevin ________________________________________

From: Jay Pipes [jaypipes@gmail.com] Sent: Tuesday, May 12, 2015 10:48
AM To:
openstack-dev@lists.openstack.org Subject: Re: [openstack-dev]
[new][cloudpulse] Announcing a project to HealthCheck OpenStack
deployments

For operators:

  • Nagios * Icinga * Zabbix

installed on baremetal machines deployed with the OpenStack and
other infrastructure services.

For tenants:

  • Nagios * Icinga * Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best, -jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse. CloudPulse provides Openstack health-checking services
to both operators, tenants, and applications. This project will
begin as a StackForge project based upon an empty cookiecutter[1]
repo. The repos to work in are: Server:
https://github.com/stackforge/cloudpulse Client:
https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting
the week after summit. This doodle poll will close May 24th and
meeting times will be announced on the mailing list at that time.
At our first IRC meeting, we will draft additional core team
members, so if your interested in joining a fresh new development
effort, please attend our first meeting. Please take a moment if
your interested in CloudPulse to fill out the doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of Ajay Kalambur, Behzad Dastur,
Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan. I expect more members to join during our initial
meeting.

A little bit about CloudPulse: Cloud operators need notification of
OpenStack failures before a customer reports the failure. Cloud
operators can then take timely corrective actions with minimal
disruption to applications. Many cloud applications, including
those I am interested in (NFV) have very stringent service level
agreements. Loss of service can trigger contractual costs
associated with the service. Application high availability
requires an operational OpenStack Cloud, and the reality is that
occascionally OpenStack clouds fail in some mysterious ways. This
project intends to identify when those failures occur so corrective
actions may be taken by operators, tenants, and the applications
themselves.

OpenStack is considered healthy when OpenStack API services
respond appropriately. Further OpenStack is healthy when network
traffic can be sent between the tenant networks and can access the
Internet. Finally OpenStack is healthy when all infrastructure
cluster elements are in an operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking the health of
OpenStack clouds. We look forward to seeing you on IRC on

openstack-cloudpulse.

Regards, Vinod Pandarinathan [1]
https://github.com/openstack-dev/cookiecutter


OpenStack Development Mailing List (not for usage questions)

Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)

Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)

Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 12, 2015 by Fox,_Kevin_M (29,360 points)   1 3 4
...