settingsLogin | Registersettings

Re: [openstack-dev] [nova] consistency and exposing quiesce in the Nova API

0 votes

I am hoping support for instance quiesce in the Nova API makes it into
OpenStack. To my understanding, this is existing function in Nova, just
not-yet exposed in the public API. (I believe Cinder uses this via a
private Nova API.)

Much of the discussion is around disaster recovery (DR) and NFV - which is
not wrong, but might be muddling the discussion? Forget DR and NFV, for the
moment.

My interest is simply in collecting high quality backups of applications
(instances) running in OpenStack. (Yes, customers are deploying
applications into OpenStack that need backup - and at large scale. They
told us, very clearly.) Ideally, I would like to give the application a
chance to properly quiesce, so the on-disk state is most-consistent, before
collecting the backup.

The existing function in Nova should be at least a good start, it just
needs to be exposed in the public Nova API. (At least, this is my
understanding.)

Of course, good backups (however collected) allow you to build DR
solutions. My immediate interest is simply to collect high-quality backups.

The part in the blueprint about an atomic operation on a list of instances
... this might be over-doing things. First, if you have a set of related
instances, very likely there is a logical order in which they should be
quiesced. Some could be quiesced concurrently. Others might need to be
sequential.

Assuming the quiesce API starts the operation, and there is some means to
check for completion, then a single-instance quiesce API should be
sufficient. An API that is synchronous (waits for completion before
returning) would also be usable. (I am not picky - just want to collect
better backups for customers.)

On Sun, May 29, 2016 at 7:24 PM, joehuang joehuang@huawei.com wrote:

Hello,

This spec[1] was to expose quiesce/unquiesce API, which had been approved
in Mitaka, but code not merged in time.

The major consideration for this spec is to enable application level
consistency snapshot, so that the backup of the snapshot in the remote site
could be recovered correctly in case of disaster recovery. Currently there
is only single VM level consistency snapshot( through create image from VM
), but it's not enough.

First, the disaster recovery is mainly the action in the infrastructure
level in case of catastrophic failures (flood, earthquake, propagating
software fault), the cloud service provider recover the infrastructure and
the applications without the help from each application owner: you can not
just recover the OpenStack, then send notification to all applications'
owners, to ask them to restore their applications by their own. As the
cloud service provider, they should be responsible for the infrastructure
and application recovery in case of disaster.

The second, this requirement is not to make OpenStack bend over NFV,
although this requirement was asked from OPNFV at first, it's general
requirement to have application level consistency snapshot. For example,
just using OpenStack itself as the application running in the cloud, we can
deploy different DB for different service, i.e. Nova has its own mysql
server nova-db-VM, Neutron has its own mysql server neutron-db-VM. In fact,
I have seen in some production to divide the db for Nova/Cinder/Neutron to
different DB server for scalability purpose. We know that there are
interaction between Nova and Neutron when booting a new VM, during the VM
booting period, some data will be in the memory cache of the
nova-db-VM/neutron-db-VM, if we just create snapshot of the volumes of
nova-db-VM/neutron-db-VM in Cinder, the data which has not been flushed to
the disk will not be in the snapshot of the volumes. We cann't make sure
when these data in the memory cache will be flushed, then
there is random possibility that the data in the snapshot is not
consistent as what happened as in the virtual machines of
nova-db-VM/neutron-db-VM.In this case, Nova/Neutron may boot in the
disaster recovery site successfully, but some port information may be
crushed for not flushed into the neutron-db-VM when doing snapshot, and in
the severe situation, even the VM may not be able to recover successfully
to run. Although there is one project called Dragon[2], Dragon can't
guarantee the consistency of the application snapshot too through OpenStack
API.

The third, for those applications which can decide the data and checkpoint
should be replicated to disaster recovery site, this is the third option
discussed and described in our analysis:
https://git.opnfv.org/cgit/multisite/tree/docs/requirements/multisite-vnf-gr-requirement.rst.
But unfortunately in Cinder, after the volume replication V2.1 is
developed, the tenant granularity volume replication is still being
discussed, and still not on single volume level. And just like what have
mentioned in the first point, both application level and infrastructure
level are needed, for you can't only expect that asking each application
owners to do recovery after disaster recovery of a site's OpenStack:
applications usually can deal with the data generated by it, but for the
configuration change's protection, it's out of scope of application. There
are several options for disaster recovery, but doesn't mean one option can
fit all.

There are several -1 for this re-proposed spec which had been approved in
Mitaka, so the explanation is sent in the mail-list for discussion. If
someone can provide other way to guarantee application level snapshot for
disaster recovery purpose, it's also welcome.

[1] Re-Propose Expose quiesce/unquiesce API:
https://review.openstack.org/#/c/295595/
[2] Dragon:
https://github.com/os-cloud-storage/openstack-workload-disaster-recovery

Best Regards
Chaoyi Huang ( Joe Huang )


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jun 16, 2016 in openstack-dev by Preston_L._Bannister (1,700 points)   1 7
retagged Jan 25, 2017 by admin

2 Responses

0 votes

On 6/16/2016 6:12 AM, Preston L. Bannister wrote:
I am hoping support for instance quiesce in the Nova API makes it into
OpenStack. To my understanding, this is existing function in Nova, just
not-yet exposed in the public API. (I believe Cinder uses this via a
private Nova API.)

I'm assuming you're thinking of the os-assisted-volume-snapshots admin
API in Nova that is called from the Cinder RemoteFSSnapDrivers
(glusterfs, scality, virtuozzo and quobyte). I started a separate thread
about that yesterday, mainly around the lack of CI testing / status so
we even have an idea if this is working consistently and we don't
regress it.

Much of the discussion is around disaster recovery (DR) and NFV - which
is not wrong, but might be muddling the discussion? Forget DR and NFV,
for the moment.

My interest is simply in collecting high quality backups of applications
(instances) running in OpenStack. (Yes, customers are deploying
applications into OpenStack that need backup - and at large scale. They
told us, very clearly.) Ideally, I would like to give the application
a chance to properly quiesce, so the on-disk state is most-consistent,
before collecting the backup.

We already attempt to quiesce an active volume-backed instance before
doing a volume snapshot:

https://github.com/openstack/nova/blob/11bd0052bdd660b63ecca53c5b6fe68f81bdf9c3/nova/compute/api.py#L2266

The existing function in Nova should be at least a good start, it just
needs to be exposed in the public Nova API. (At least, this is my
understanding.)

Of course, good backups (however collected) allow you to build DR
solutions. My immediate interest is simply to collect high-quality backups.

The part in the blueprint about an atomic operation on a list of
instances ... this might be over-doing things. First, if you have a set
of related instances, very likely there is a logical order in which they
should be quiesced. Some could be quiesced concurrently. Others might
need to be sequential.

Assuming the quiesce API starts the operation, and there is some means
to check for completion, then a single-instance quiesce API should be
sufficient. An API that is synchronous (waits for completion before
returning) would also be usable. (I am not picky - just want to collect
better backups for customers.)

As noted above, we already attempt to quiesce when doing a volume-backed
instance snapshot.

The problem comes in with the chaining and orchestration around a list
of instances. That requires additional state management and overhead
within Nova and while we're actively trying to redo parts of the code
base to make things less terrible, adding more complexity on top at the
same time doesn't help.

I'm also not sure what something like multiattach volumes will throw
into the mix with this, but that's another DR/HA requirement.

So I get that lots of people want lots of things that aren't in Nova
right now. We have that coming from several different projects (cinder
for multiattach volumes, neutron for vlan-aware-vms and routed
networks), and several different groups (NFV, ops).

We also have a lot of people that just want the basic IaaS layer to work
for the compute service in an OpenStack cloud, like being able to scale
that out better and track resource usage for accurate scheduling.

And we have a lot of developers that want to be able to actually
understand what it is the code is doing, and a much smaller number of
core maintainers / reviewers that don't want to have to keep piling
technical debt into the project while we're trying to fix some of what's
already built up over the years - and actually have this stuff backed
with integration testing.

So, I get it. We all have requirements and we all have resource
limitations, which is why we as a team prioritize our work items for the
release. This one didn't make it for Newton.

On Sun, May 29, 2016 at 7:24 PM, joehuang <joehuang@huawei.com
joehuang@huawei.com> wrote:

Hello,

This spec[1] was to expose quiesce/unquiesce API, which had been
approved in Mitaka, but code not merged in time.

The major consideration for this spec is to enable application level
consistency snapshot, so that the backup of the snapshot in the
remote site could be recovered correctly in case of disaster
recovery. Currently there is only single VM level consistency
snapshot( through create image from VM ), but it's not enough.

First, the disaster recovery is mainly the action in the
infrastructure level in case of catastrophic failures (flood,
earthquake, propagating software fault), the cloud service provider
recover the infrastructure and the applications without the help
from each application owner: you can not just recover the OpenStack,
then send notification to all applications' owners, to ask them to
restore their applications by their own. As the cloud service
provider, they should be responsible for the infrastructure and
application recovery in case of disaster.

The second, this requirement is not to make OpenStack bend over NFV,
although this requirement was asked from OPNFV at first, it's
general requirement to have application level consistency snapshot.
For example, just using OpenStack itself as the application running
in the cloud, we can deploy different DB for different service, i.e.
Nova has its own mysql server nova-db-VM, Neutron has its own mysql
server neutron-db-VM. In fact, I have seen in some production to
divide the db for Nova/Cinder/Neutron to different DB server for
scalability purpose. We know that there are interaction between Nova
and Neutron when booting a new VM, during the VM booting period,
some data will be in the memory cache of the
nova-db-VM/neutron-db-VM, if we just create snapshot of the volumes
of nova-db-VM/neutron-db-VM in Cinder, the data which has not been
flushed to the disk will not be in the snapshot of the volumes. We
cann't make sure when these data in the memory cache will be
flushed, then
 there is random possibility that the data in the snapshot is not
consistent as what happened as in the virtual machines of
nova-db-VM/neutron-db-VM.In this case, Nova/Neutron may boot in the
disaster recovery site successfully, but some port information may
be crushed for not flushed into the neutron-db-VM when doing
snapshot, and in the severe situation, even the VM may not be able
to recover successfully to run. Although there is one project called
Dragon[2], Dragon can't guarantee the consistency of the application
snapshot too through OpenStack API.

The third, for those applications which can decide the data and
checkpoint should be replicated to disaster recovery site, this is
the third option discussed and described in our analysis:
https://git.opnfv.org/cgit/multisite/tree/docs/requirements/multisite-vnf-gr-requirement.rst.
But unfortunately in Cinder, after the volume replication V2.1 is
developed, the tenant granularity volume replication is still being
discussed, and still not on single volume level. And just like what
have mentioned in the first point, both application level and
infrastructure level are needed, for you can't only expect that
asking each application owners to do recovery after disaster
recovery of a site's OpenStack: applications usually can deal with
the data generated by it, but for the configuration change's
protection, it's out of scope of application. There are several
options for disaster recovery, but doesn't mean one option can fit all.

There are several -1 for this re-proposed spec which had been
approved in Mitaka, so the explanation is sent in the mail-list for
discussion. If someone can provide other way to guarantee
application level snapshot for disaster recovery purpose, it's also
welcome.

[1] Re-Propose Expose quiesce/unquiesce API:
https://review.openstack.org/#/c/295595/
[2] Dragon:
https://github.com/os-cloud-storage/openstack-workload-disaster-recovery

Best Regards
Chaoyi Huang ( Joe Huang )
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--

Thanks,

Matt Riedemann


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 16, 2016 by Matt_Riedemann (48,320 points)   3 10 23
0 votes

Comments inline.

On Thu, Jun 16, 2016 at 10:13 AM, Matt Riedemann <mriedem@linux.vnet.ibm.com
wrote:

On 6/16/2016 6:12 AM, Preston L. Bannister wrote:

I am hoping support for instance quiesce in the Nova API makes it into
OpenStack. To my understanding, this is existing function in Nova, just
not-yet exposed in the public API. (I believe Cinder uses this via a
private Nova API.)

I'm assuming you're thinking of the os-assisted-volume-snapshots admin API
in Nova that is called from the Cinder RemoteFSSnapDrivers (glusterfs,
scality, virtuozzo and quobyte). I started a separate thread about that
yesterday, mainly around the lack of CI testing / status so we even have an
idea if this is working consistently and we don't regress it.

Yes, I believe we are talking about the same thing. Also, I saw your other
message. :)

Much of the discussion is around disaster recovery (DR) and NFV - which

is not wrong, but might be muddling the discussion? Forget DR and NFV,
for the moment.

My interest is simply in collecting high quality backups of applications
(instances) running in OpenStack. (Yes, customers are deploying
applications into OpenStack that need backup - and at large scale. They
told us, very clearly.) Ideally, I would like to give the application
a chance to properly quiesce, so the on-disk state is most-consistent,
before collecting the backup.

We already attempt to quiesce an active volume-backed instance before
doing a volume snapshot:

https://github.com/openstack/nova/blob/11bd0052bdd660b63ecca53c5b6fe68f81bdf9c3/nova/compute/api.py#L2266

The problem is, from my point of view, if the instance has more than one
volume (and many do), then quiescing the instance for more than once is not
very nice.

The existing function in Nova should be at least a good start, it just

needs to be exposed in the public Nova API. (At least, this is my
understanding.)

Of course, good backups (however collected) allow you to build DR
solutions. My immediate interest is simply to collect high-quality
backups.

The part in the blueprint about an atomic operation on a list of
instances ... this might be over-doing things. First, if you have a set
of related instances, very likely there is a logical order in which they
should be quiesced. Some could be quiesced concurrently. Others might
need to be sequential.

Assuming the quiesce API starts the operation, and there is some means
to check for completion, then a single-instance quiesce API should be
sufficient. An API that is synchronous (waits for completion before
returning) would also be usable. (I am not picky - just want to collect
better backups for customers.)

As noted above, we already attempt to quiesce when doing a volume-backed
instance snapshot.

The problem comes in with the chaining and orchestration around a list of
instances. That requires additional state management and overhead within
Nova and while we're actively trying to redo parts of the code base to make
things less terrible, adding more complexity on top at the same time
doesn't help.

I agree with your concern. To be clear, what I am hoping for is the
simplest possible version - a API to quiesce/unquiesce a single instance,
similar to the existing pause/unpause APIs.

Handling of lists of instances (and response to state changes), I would
expect implement on the caller-side. There are application-specific
semantics, so a single-instance API has merit from my perspective.

I'm also not sure what something like multiattach volumes will throw into
the mix with this, but that's another DR/HA requirement.

So I get that lots of people want lots of things that aren't in Nova right
now. We have that coming from several different projects (cinder for
multiattach volumes, neutron for vlan-aware-vms and routed networks), and
several different groups (NFV, ops).

We also have a lot of people that just want the basic IaaS layer to work
for the compute service in an OpenStack cloud, like being able to scale
that out better and track resource usage for accurate scheduling.

And we have a lot of developers that want to be able to actually
understand what it is the code is doing, and a much smaller number of core
maintainers / reviewers that don't want to have to keep piling technical
debt into the project while we're trying to fix some of what's already
built up over the years - and actually have this stuff backed with
integration testing.

So, I get it. We all have requirements and we all have resource
limitations, which is why we as a team prioritize our work items for the
release. This one didn't make it for Newton.

Ah. I did not quite get that from what I read online. Unfortunate. Also
sounds like the Nova-folk are overloaded, and we need to come up with
resources to contribute to Nova, if we want this to appear in better time.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 16, 2016 by Preston_L._Bannister (1,700 points)   1 7
...