settingsLogin | Registersettings

[OpenStack-DefCore] Getting DefCore certificate for clouds that instances can only boot from volume

0 votes

Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we are
now working on DefCore certificate for our public cloud platform and we
meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume" when
launching instances; When we talk about large scale commercial deployments
such as Public Cloud, the reliability of the service is been considered as
the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1.
Nova-compute with no shared storage backend; 2. Nova-compute with shared
storage backend. As for case 1, the system disk created from the image will
be created on the local disk of the host that nova-compute is on, and
the reliability of the userdata is considered low and it will be very hard
to manage this large amount of disks from different hosts all over the
deployment, thus it can be considered not commercially ready for large
scale deployments. As for case 2, the problem of reliability and manage can
be solved, but new problems are introduced - the resource usage and
capacity amounts tracking being incorrect, this has been an known issue[1]
in Nova for a long time and the Nova team is trying to solve the problem by
introducing a new "resource provider" architecture [2], this new
architecture will need few releases to be fully functional, thus case 2 is
also considered to be not commercially ready.

For the reasons I listed above, we have chosen to use "boot from Volume" to
be the only way of booting instance in our Public Cloud, by doing this, we
can overcome the above mentioned cons and get other benefits such as:

Resiliency - Cloud Block Storage is a persistent volume, users can retain
it after the server is deleted. Users can then use the volume to create a
new server.
Flexibility - User can have control over the size and type (SSD or SATA) of
volume that used to boot the server. This control enables users to
fine-tune the storage to the needs of your operating system or application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the
DefCore certificate:

we have tests that trying to get instance list filtered by "image_id" which
is None for volume booted instances:

tempest.api.compute.servers.testcreateserver.ServersTestJSON.testverifyserverdetails
tempest.api.compute.servers.test
createserver.ServersTestManualDisk.testverifyserverdetails
tempest.api.compute.servers.testlistserverfilters.ListServerFiltersTestJSON.testlistserversdetailedfilterbyimage
tempest.api.compute.servers.test
listserverfilters.ListServerFiltersTestJSON.testlistserversfilterby_image

- The detailed information for instances booted from volumes does not

contain informations about image_id, thus the test cases filter instance by
image id cannot pass.

we also have tests like this:

tempest.api.compute.images.testimages.ImagesTestJSON.testdeletesavingimage

- This test tests creating an image for an instance, and delete the

created instance snapshot during the image status of “saving”. As for
instances booted from images, the snapshot status flow will be:
queued->saving->active. But for instances booted from volumes, the action
of instance snapshotting is actually an volume snapshot action done by
cinder, the image saved in glance will only have the link to the created
cinder volume snapshot, and the image status will be directly change to
“active”, as the logic in this test will wait for the image status in
glance change to “saving”, so it cannot pass for volume booted instances.

Also:

testattachvolume.AttachVolumeTestJSON.testlistgetvolumeattachments

- This test attaches one volume to an instance and then counts the

number of attachments for that instance, the expected count was hardcoded
to be 1. As for volume booted instances, the system disk is already an
attachment, so the actual count of attachment will be 2, and the test fails.

And finally:

tempest.api.compute.servers.testserveractions.ServerActionsTestJSON.testrebuildserver
tempest.api.compute.servers.testserversnegative.ServersNegativeTestJSON.testrebuilddeletedserver
tempest.api.compute.servers.test
serversnegative.ServersNegativeTestJSON.testrebuildnonexistent_server

- Rebuilding action is not supported when the instance is created via

volume.

All those tests mentioned above are not friendly to "boot from Volume"
instances, we hope we can have some workarounds about the above mentioned
tests, as the problem that is having with "boot from Image" is really
stopping us using it and it will also be good for DefCore if we can figure
out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

[2] BP about solving resource usage reporting and calculation with a
generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/newton/approved/generic-resource-pools.rst

Thanks,

Kevin Zheng


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
asked Sep 2, 2016 in defcore-committee by Zhenyu_Zheng (2,800 points)   6 6

5 Responses

0 votes

Interesting feedbacks :)
It looks like the 'get-me-a-network' issue [1]. It's a detail implementation, choose for technical/business reasons, which broke the tests and the compatibility.
We will probably have other subject like this one.

Do we already speak about providing different level for each program? I'm not speaking about guidelines which target the releases. I'm speaking about:
- OpenStack Powered compute containing
- OpenStack Powered compute basics (get-me-a-network, get-me-a-compute, get-me-a-storage)
- OpenStack Powered compute advanced (boot-from-volume, boot-from-image, resize-a-instance, get-a-floating-ip, ...)

Maybe we should add another level between basics and advanced.

[1] http://specs.openstack.org/openstack/neutron-specs/specs/liberty/get-me-a-network.html

--
Jean-Daniel Bonnetot
http://www.ovh.com
@pilgrimstack

Le 2 sept. 2016 à 08:47, Zhenyu Zheng zhengzhenyulixi@gmail.com a écrit :

Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we are now working on DefCore certificate for our public cloud platform and we meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume" when launching instances; When we talk about large scale commercial deployments such as Public Cloud, the reliability of the service is been considered as the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1. Nova-compute with no shared storage backend; 2. Nova-compute with shared storage backend. As for case 1, the system disk created from the image will be created on the local disk of the host that nova-compute is on, and the reliability of the userdata is considered low and it will be very hard to manage this large amount of disks from different hosts all over the deployment, thus it can be considered not commercially ready for large scale deployments. As for case 2, the problem of reliability and manage can be solved, but new problems are introduced - the resource usage and capacity amounts tracking being incorrect, this has been an known issue[1] in Nova for a long time and the Nova team is trying to solve the problem by introducing a new "resource provider" architecture [2], this new architecture will need few releases to be fully functional, thus case 2 is also considered to be not commercially ready.

For the reasons I listed above, we have chosen to use "boot from Volume" to be the only way of booting instance in our Public Cloud, by doing this, we can overcome the above mentioned cons and get other benefits such as:

Resiliency - Cloud Block Storage is a persistent volume, users can retain it after the server is deleted. Users can then use the volume to create a new server.
Flexibility - User can have control over the size and type (SSD or SATA) of volume that used to boot the server. This control enables users to fine-tune the storage to the needs of your operating system or application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the DefCore certificate:

we have tests that trying to get instance list filtered by "image_id" which is None for volume booted instances:

tempest.api.compute.servers.testcreateserver.ServersTestJSON.testverifyserverdetails
tempest.api.compute.servers.test
createserver.ServersTestManualDisk.testverifyserverdetails
tempest.api.compute.servers.testlistserverfilters.ListServerFiltersTestJSON.testlistserversdetailedfilterbyimage
tempest.api.compute.servers.test
listserverfilters.ListServerFiltersTestJSON.testlistserversfilterby_image

- The detailed information for instances booted from volumes does not contain informations about image_id, thus the test cases filter instance by image id cannot pass.

we also have tests like this:

tempest.api.compute.images.testimages.ImagesTestJSON.testdeletesavingimage

- This test tests creating an image for an instance, and delete the created instance snapshot during the image status of “saving”. As for instances booted from images, the snapshot status flow will be: queued->saving->active. But for instances booted from volumes, the action of instance snapshotting is actually an volume snapshot action done by cinder, the image saved in glance will only have the link to the created cinder volume snapshot, and the image status will be directly change to “active”, as the logic in this test will wait for the image status in glance change to “saving”, so it cannot pass for volume booted instances.

Also:

testattachvolume.AttachVolumeTestJSON.testlistgetvolumeattachments

- This test attaches one volume to an instance and then counts the number of attachments for that instance, the expected count was hardcoded to be 1. As for volume booted instances, the system disk is already an attachment, so the actual count of attachment will be 2, and the test fails.

And finally:

tempest.api.compute.servers.testserveractions.ServerActionsTestJSON.testrebuildserver
tempest.api.compute.servers.testserversnegative.ServersNegativeTestJSON.testrebuilddeletedserver
tempest.api.compute.servers.test
serversnegative.ServersNegativeTestJSON.testrebuildnonexistent_server

- Rebuilding action is not supported when the instance is created via volume.

All those tests mentioned above are not friendly to "boot from Volume" instances, we hope we can have some workarounds about the above mentioned tests, as the problem that is having with "boot from Image" is really stopping us using it and it will also be good for DefCore if we can figure out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

[2] BP about solving resource usage reporting and calculation with a generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/newton/approved/generic-resource-pools.rst

Thanks,

Kevin Zheng


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
responded Sep 2, 2016 by jean-daniel.bonnetot (600 points)  
0 votes

On 09/02/2016 01:47 AM, Zhenyu Zheng wrote:
Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we are
now working on DefCore certificate for our public cloud platform and we
meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume"
when launching instances; When we talk about large scale commercial
deployments such as Public Cloud, the reliability of the service is been
considered as the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1.
Nova-compute with no shared storage backend; 2. Nova-compute with shared
storage backend. As for case 1, the system disk created from the image
will be created on the local disk of the host that nova-compute is on,
and the reliability of the userdata is considered low and it will be
very hard to manage this large amount of disks from different hosts all
over the deployment, thus it can be considered not commercially ready
for large scale deployments. As for case 2, the problem of reliability
and manage can be solved, but new problems are introduced - the resource
usage and capacity amounts tracking being incorrect, this has been an
known issue[1] in Nova for a long time and the Nova team is trying to
solve the problem by introducing a new "resource provider" architecture
[2], this new architecture will need few releases to be fully
functional, thus case 2 is also considered to be not commercially ready.

For the reasons I listed above, we have chosen to use "boot from Volume"
to be the only way of booting instance in our Public Cloud, by doing
this, we can overcome the above mentioned cons and get other benefits
such as:

Resiliency - Cloud Block Storage is a persistent volume, users can
retain it after the server is deleted. Users can then use the volume to
create a new server.
Flexibility - User can have control over the size and type (SSD or SATA)
of volume that used to boot the server. This control enables users to
fine-tune the storage to the needs of your operating system or application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the
DefCore certificate:

we have tests that trying to get instance list filtered by "image_id"
which is None for volume booted instances:

tempest.api.compute.servers.testcreateserver.ServersTestJSON.testverifyserverdetails
tempest.api.compute.servers.test
createserver.ServersTestManualDisk.testverifyserverdetails
tempest.api.compute.servers.testlistserverfilters.ListServerFiltersTestJSON.testlistserversdetailedfilterbyimage
tempest.api.compute.servers.test
listserverfilters.ListServerFiltersTestJSON.testlistserversfilterby_image

- The detailed information for instances booted from volumes does

not contain informations about image_id, thus the test cases filter
instance by image id cannot pass.

we also have tests like this:

tempest.api.compute.images.testimages.ImagesTestJSON.testdeletesavingimage

- This test tests creating an image for an instance, and delete the

created instance snapshot during the image status of “saving”. As for
instances booted from images, the snapshot status flow will be:
queued->saving->active. But for instances booted from volumes, the
action of instance snapshotting is actually an volume snapshot action
done by cinder, the image saved in glance will only have the link to the
created cinder volume snapshot, and the image status will be directly
change to “active”, as the logic in this test will wait for the image
status in glance change to “saving”, so it cannot pass for volume booted
instances.

Also:

testattachvolume.AttachVolumeTestJSON.testlistgetvolumeattachments

- This test attaches one volume to an instance and then counts the

number of attachments for that instance, the expected count was
hardcoded to be 1. As for volume booted instances, the system disk is
already an attachment, so the actual count of attachment will be 2, and
the test fails.

And finally:

tempest.api.compute.servers.testserveractions.ServerActionsTestJSON.testrebuildserver
tempest.api.compute.servers.testserversnegative.ServersNegativeTestJSON.testrebuilddeletedserver
tempest.api.compute.servers.test
serversnegative.ServersNegativeTestJSON.testrebuildnonexistent_server

- Rebuilding action is not supported when the instance is created

via volume.

All those tests mentioned above are not friendly to "boot from Volume"
instances, we hope we can have some workarounds about the above
mentioned tests, as the problem that is having with "boot from Image" is
really stopping us using it and it will also be good for DefCore if we
can figure out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

[2] BP about solving resource usage reporting and calculation with a
generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/newton/approved/generic-resource-pools.rst

I can totally understand why would would value boot from volume. It has
a bunch of great features, as you mention.

However, running a cloud that disables boot from image is a niche choice
and I do not think that we should allow such a cloud to be considered
"normal". If I were to encounter such a cloud, based on the workloads I
currently run in 10 other public OpenStack clouds, I would consider it
broken - and none of my automation that has been built based on how
OpenStack clouds work consistently would work with that cloud.

I do think that we should do whatever we need to to push that boot from
volume is a regular, expected and consistent thing that people who are
using clouds can count on. I do not think that we should accept lack of
boot from image as a valid choice. It does not promote interoperability,
and it removes choice from the end user, which is a Bad Thing.

It seems that some analysis has been done to determine that
boot-from-image is somehow not production ready or scalable.

To counter that, I would like to point out that the OpenStack Infra
team, using resources in Rackspace, OVH, Vexxhost, Internap, BlueBox,
the OpenStack Innovation Center, a private cloud run by the TripleO team
and a private cloud run by the Infra team boot 20k instance per day
using custom images. We upload those custom-made images using Glance
image upload daily. We have over 10 different custom images - each about
7.7G in size. While we DO have node-launch errors given the number we
launch each day:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=16&fullscreen

it's a small number compared to the successful node launches:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=15&fullscreen

And we have tracked ZERO of the problems down to anything related to
images. (it's most frequently networking related)

We do have issues successfully uploading new images to the cloud - but
we also have rather large images since they contain a bunch of cached
data ... and the glance team is working on making the image upload
process more resilient and scalable.

In summary:

  • Please re-enable boot from image on your cloud if you care about
    interoperability and end users

  • Please do not think that after having disabled one of the most common
    and fundamental features of the cloud that the group responsible for
    ensuring cloud interoperability should change anything to allow your
    divergent cloud to be considered interoperable. It is not. It needs to
    be fixed.

If the tests we have right now are only testing boot-from-image as an
implementation happenstance, we should immediately add tests that
EXPLICITLY test for boot-from-image. If we cannot count on that basic
functionality, the we truly will have given up on the entire idea of
interoperable clouds.

Thank you!
Monty


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
responded Sep 2, 2016 by Monty_Taylor (22,780 points)   2 4 7
0 votes

I’ve been thinking quite a bit about your response Monty, and have
some observations and suggestions below.

On Sep 2, 2016, at 8:30 AM, Monty Taylor mordred@inaugust.com wrote:

On 09/02/2016 01:47 AM, Zhenyu Zheng wrote:

Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we are
now working on DefCore certificate for our public cloud platform and we
meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume"
when launching instances; When we talk about large scale commercial
deployments such as Public Cloud, the reliability of the service is been
considered as the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1.
Nova-compute with no shared storage backend; 2. Nova-compute with shared
storage backend. As for case 1, the system disk created from the image
will be created on the local disk of the host that nova-compute is on,
and the reliability of the userdata is considered low and it will be
very hard to manage this large amount of disks from different hosts all
over the deployment, thus it can be considered not commercially ready
for large scale deployments. As for case 2, the problem of reliability
and manage can be solved, but new problems are introduced - the resource
usage and capacity amounts tracking being incorrect, this has been an
known issue[1] in Nova for a long time and the Nova team is trying to
solve the problem by introducing a new "resource provider" architecture
[2], this new architecture will need few releases to be fully
functional, thus case 2 is also considered to be not commercially ready.

For the reasons I listed above, we have chosen to use "boot from Volume"
to be the only way of booting instance in our Public Cloud, by doing
this, we can overcome the above mentioned cons and get other benefits
such as:

Resiliency - Cloud Block Storage is a persistent volume, users can
retain it after the server is deleted. Users can then use the volume to
create a new server.
Flexibility - User can have control over the size and type (SSD or SATA)
of volume that used to boot the server. This control enables users to
fine-tune the storage to the needs of your operating system or application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the
DefCore certificate:

we have tests that trying to get instance list filtered by "image_id"
which is None for volume booted instances:

tempest.api.compute.servers.testcreateserver.ServersTestJSON.testverifyserverdetails
tempest.api.compute.servers.test
createserver.ServersTestManualDisk.testverifyserverdetails
tempest.api.compute.servers.testlistserverfilters.ListServerFiltersTestJSON.testlistserversdetailedfilterbyimage
tempest.api.compute.servers.test
listserverfilters.ListServerFiltersTestJSON.testlistserversfilterby_image

  • The detailed information for instances booted from volumes does
    not contain informations about image_id, thus the test cases filter
    instance by image id cannot pass.

we also have tests like this:

tempest.api.compute.images.testimages.ImagesTestJSON.testdeletesavingimage

  • This test tests creating an image for an instance, and delete the
    created instance snapshot during the image status of “saving”. As for
    instances booted from images, the snapshot status flow will be:
    queued->saving->active. But for instances booted from volumes, the
    action of instance snapshotting is actually an volume snapshot action
    done by cinder, the image saved in glance will only have the link to the
    created cinder volume snapshot, and the image status will be directly
    change to “active”, as the logic in this test will wait for the image
    status in glance change to “saving”, so it cannot pass for volume booted
    instances.

Also:

testattachvolume.AttachVolumeTestJSON.testlistgetvolumeattachments

  • This test attaches one volume to an instance and then counts the
    number of attachments for that instance, the expected count was
    hardcoded to be 1. As for volume booted instances, the system disk is
    already an attachment, so the actual count of attachment will be 2, and
    the test fails.

And finally:

tempest.api.compute.servers.testserveractions.ServerActionsTestJSON.testrebuildserver
tempest.api.compute.servers.testserversnegative.ServersNegativeTestJSON.testrebuilddeletedserver
tempest.api.compute.servers.test
serversnegative.ServersNegativeTestJSON.testrebuildnonexistent_server

  • Rebuilding action is not supported when the instance is created
    via volume.

All those tests mentioned above are not friendly to "boot from Volume"
instances, we hope we can have some workarounds about the above
mentioned tests, as the problem that is having with "boot from Image" is
really stopping us using it and it will also be good for DefCore if we
can figure out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

[2] BP about solving resource usage reporting and calculation with a
generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/newton/approved/generic-resource-pools.rst

I can totally understand why would would value boot from volume. It has
a bunch of great features, as you mention.

However, running a cloud that disables boot from image is a niche choice
and I do not think that we should allow such a cloud to be considered
"normal". If I were to encounter such a cloud, based on the workloads I
currently run in 10 other public OpenStack clouds, I would consider it
broken - and none of my automation that has been built based on how
OpenStack clouds work consistently would work with that cloud.

If I understand their implementation correctly, the boot from volume
is an implementation detail of booting from what appears to be a
standard image from an API standpoint. We need to differentiate
between disallowing a set of APIs and implementing a different
backend for a particular API.

I do think that we should do whatever we need to to push that boot from
volume is a regular, expected and consistent thing that people who are
using clouds can count on. I do not think that we should accept lack of
boot from image as a valid choice. It does not promote interoperability,
and it removes choice from the end user, which is a Bad Thing.

I think a better point of view is if a vendor chooses to use a different
backend, we should still expect the user-facing API to behave predictably.
It seems that we’re facing a leaky abstraction more than we are
some decision to not conform to the expected API behavior.

It seems that some analysis has been done to determine that
boot-from-image is somehow not production ready or scalable.

To counter that, I would like to point out that the OpenStack Infra
team, using resources in Rackspace, OVH, Vexxhost, Internap, BlueBox,
the OpenStack Innovation Center, a private cloud run by the TripleO team
and a private cloud run by the Infra team boot 20k instance per day
using custom images. We upload those custom-made images using Glance
image upload daily. We have over 10 different custom images - each about
7.7G in size. While we DO have node-launch errors given the number we
launch each day:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=16&fullscreen

it's a small number compared to the successful node launches:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=15&fullscreen

And we have tracked ZERO of the problems down to anything related to
images. (it's most frequently networking related)

We do have issues successfully uploading new images to the cloud - but
we also have rather large images since they contain a bunch of cached
data ... and the glance team is working on making the image upload
process more resilient and scalable.

In summary:

  • Please re-enable boot from image on your cloud if you care about
    interoperability and end users

-or- fix the unexpected behavior in the interoperable API to account
for the implementation details.

  • Please do not think that after having disabled one of the most common
    and fundamental features of the cloud that the group responsible for
    ensuring cloud interoperability should change anything to allow your
    divergent cloud to be considered interoperable. It is not. It needs to
    be fixed.

I don’t disagree, but I also think it’s important to work with the issues that
vendors who deploy OpenStack are facing and try to understand how
they fit into the larger ecosystem. Part of what we’re trying to accomplish
here is build a virtuous circle between upstream developers, downstream
deployers, and users.

If the tests we have right now are only testing boot-from-image as an
implementation happenstance, we should immediately add tests that
EXPLICITLY test for boot-from-image. If we cannot count on that basic
functionality, the we truly will have given up on the entire idea of
interoperable clouds.

We can count on the basic black-box functionality at some level. It’s
the interaction of the implementation with the rest of the API that’s
causing problems.

The ‘create from image’ call itself does part of what it’s advertised to do,
it boots an image. And at first pass (the tests for the actual launching
of the image) all looks well. The issue comes later when the user
queries the ‘read vms that are images’ api, and it’s clear the abstraction
loop hasn’t been closed. This is exactly what the interoperability
tests are meant to catch, and indicates what needs to be fixed.

How it’s fixed is a different story, and I think that we as a community
need to be careful about prescribing one solution over another.

-Chris

Thank you!
Monty


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
responded Sep 14, 2016 by chris_at_openstack.o (3,260 points)   2 3
0 votes

On 09/14/2016 12:30 AM, Chris Hoge wrote:
I’ve been thinking quite a bit about your response Monty, and have
some observations and suggestions below.

On Sep 2, 2016, at 8:30 AM, Monty Taylor mordred@inaugust.com wrote:

On 09/02/2016 01:47 AM, Zhenyu Zheng wrote:

Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we are
now working on DefCore certificate for our public cloud platform and we
meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume"
when launching instances; When we talk about large scale commercial
deployments such as Public Cloud, the reliability of the service is been
considered as the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1.
Nova-compute with no shared storage backend; 2. Nova-compute with shared
storage backend. As for case 1, the system disk created from the image
will be created on the local disk of the host that nova-compute is on,
and the reliability of the userdata is considered low and it will be
very hard to manage this large amount of disks from different hosts all
over the deployment, thus it can be considered not commercially ready
for large scale deployments. As for case 2, the problem of reliability
and manage can be solved, but new problems are introduced - the resource
usage and capacity amounts tracking being incorrect, this has been an
known issue[1] in Nova for a long time and the Nova team is trying to
solve the problem by introducing a new "resource provider" architecture
[2], this new architecture will need few releases to be fully
functional, thus case 2 is also considered to be not commercially ready.

For the reasons I listed above, we have chosen to use "boot from Volume"
to be the only way of booting instance in our Public Cloud, by doing
this, we can overcome the above mentioned cons and get other benefits
such as:

Resiliency - Cloud Block Storage is a persistent volume, users can
retain it after the server is deleted. Users can then use the volume to
create a new server.
Flexibility - User can have control over the size and type (SSD or SATA)
of volume that used to boot the server. This control enables users to
fine-tune the storage to the needs of your operating system or application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the
DefCore certificate:

we have tests that trying to get instance list filtered by "image_id"
which is None for volume booted instances:

tempest.api.compute.servers.testcreateserver.ServersTestJSON.testverifyserverdetails
tempest.api.compute.servers.test
createserver.ServersTestManualDisk.testverifyserverdetails
tempest.api.compute.servers.testlistserverfilters.ListServerFiltersTestJSON.testlistserversdetailedfilterbyimage
tempest.api.compute.servers.test
listserverfilters.ListServerFiltersTestJSON.testlistserversfilterby_image

  • The detailed information for instances booted from volumes does
    not contain informations about image_id, thus the test cases filter
    instance by image id cannot pass.

we also have tests like this:

tempest.api.compute.images.testimages.ImagesTestJSON.testdeletesavingimage

  • This test tests creating an image for an instance, and delete the
    created instance snapshot during the image status of “saving”. As for
    instances booted from images, the snapshot status flow will be:
    queued->saving->active. But for instances booted from volumes, the
    action of instance snapshotting is actually an volume snapshot action
    done by cinder, the image saved in glance will only have the link to the
    created cinder volume snapshot, and the image status will be directly
    change to “active”, as the logic in this test will wait for the image
    status in glance change to “saving”, so it cannot pass for volume booted
    instances.

Also:

testattachvolume.AttachVolumeTestJSON.testlistgetvolumeattachments

  • This test attaches one volume to an instance and then counts the
    number of attachments for that instance, the expected count was
    hardcoded to be 1. As for volume booted instances, the system disk is
    already an attachment, so the actual count of attachment will be 2, and
    the test fails.

And finally:

tempest.api.compute.servers.testserveractions.ServerActionsTestJSON.testrebuildserver
tempest.api.compute.servers.testserversnegative.ServersNegativeTestJSON.testrebuilddeletedserver
tempest.api.compute.servers.test
serversnegative.ServersNegativeTestJSON.testrebuildnonexistent_server

  • Rebuilding action is not supported when the instance is created
    via volume.

All those tests mentioned above are not friendly to "boot from Volume"
instances, we hope we can have some workarounds about the above
mentioned tests, as the problem that is having with "boot from Image" is
really stopping us using it and it will also be good for DefCore if we
can figure out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

[2] BP about solving resource usage reporting and calculation with a
generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/newton/approved/generic-resource-pools.rst

I can totally understand why would would value boot from volume. It has
a bunch of great features, as you mention.

However, running a cloud that disables boot from image is a niche choice
and I do not think that we should allow such a cloud to be considered
"normal". If I were to encounter such a cloud, based on the workloads I
currently run in 10 other public OpenStack clouds, I would consider it
broken - and none of my automation that has been built based on how
OpenStack clouds work consistently would work with that cloud.

If I understand their implementation correctly, the boot from volume
is an implementation detail of booting from what appears to be a
standard image from an API standpoint. We need to differentiate
between disallowing a set of APIs and implementing a different
backend for a particular API.

Yes, I whole heartedly agree!

I do think that we should do whatever we need to to push that boot from
volume is a regular, expected and consistent thing that people who are
using clouds can count on. I do not think that we should accept lack of
boot from image as a valid choice. It does not promote interoperability,
and it removes choice from the end user, which is a Bad Thing.

I think a better point of view is if a vendor chooses to use a different
backend, we should still expect the user-facing API to behave predictably.
It seems that we’re facing a leaky abstraction more than we are
some decision to not conform to the expected API behavior.

I believe that we are in agreement but that you have stated it better
than I did.

It seems that some analysis has been done to determine that
boot-from-image is somehow not production ready or scalable.

To counter that, I would like to point out that the OpenStack Infra
team, using resources in Rackspace, OVH, Vexxhost, Internap, BlueBox,
the OpenStack Innovation Center, a private cloud run by the TripleO team
and a private cloud run by the Infra team boot 20k instance per day
using custom images. We upload those custom-made images using Glance
image upload daily. We have over 10 different custom images - each about
7.7G in size. While we DO have node-launch errors given the number we
launch each day:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=16&fullscreen

it's a small number compared to the successful node launches:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=15&fullscreen

And we have tracked ZERO of the problems down to anything related to
images. (it's most frequently networking related)

We do have issues successfully uploading new images to the cloud - but
we also have rather large images since they contain a bunch of cached
data ... and the glance team is working on making the image upload
process more resilient and scalable.

In summary:

  • Please re-enable boot from image on your cloud if you care about
    interoperability and end users

-or- fix the unexpected behavior in the interoperable API to account
for the implementation details.

++

  • Please do not think that after having disabled one of the most common
    and fundamental features of the cloud that the group responsible for
    ensuring cloud interoperability should change anything to allow your
    divergent cloud to be considered interoperable. It is not. It needs to
    be fixed.

I don’t disagree, but I also think it’s important to work with the issues that
vendors who deploy OpenStack are facing and try to understand how
they fit into the larger ecosystem. Part of what we’re trying to accomplish
here is build a virtuous circle between upstream developers, downstream
deployers, and users.

I absolutely agree. Again, you said words better than I did. If/when
there is an issue a deployer has, it's important to fix it.

If the tests we have right now are only testing boot-from-image as an
implementation happenstance, we should immediately add tests that
EXPLICITLY test for boot-from-image. If we cannot count on that basic
functionality, the we truly will have given up on the entire idea of
interoperable clouds.

We can count on the basic black-box functionality at some level. It’s
the interaction of the implementation with the rest of the API that’s
causing problems.

The ‘create from image’ call itself does part of what it’s advertised to do,
it boots an image. And at first pass (the tests for the actual launching
of the image) all looks well. The issue comes later when the user
queries the ‘read vms that are images’ api, and it’s clear the abstraction
loop hasn’t been closed. This is exactly what the interoperability
tests are meant to catch, and indicates what needs to be fixed.

Yup

How it’s fixed is a different story, and I think that we as a community
need to be careful about prescribing one solution over another.

Totally.

I should be clear about my POV on this, just for the record.

I by and large speak from the perspective of someone who consumes
OpenStack APIs across a lot of clouds. One of the things I think is
great about OpenStack is that in theory I should never need to know the
implementation details that someone has chosen to make.

A good example of this is cells v1. Rackspace is the only public cloud
I'm aware of that runs cells v1 ... but I do not know this as a consumer
of the API. It's completely transparent to me, even though it's an
implementation choice Rackspace made to deal with scaling issues. It's a
place where our providers have been able to make the choices that make
sense and our end-users don't suffer because of it. This is good!

Very related to this thread, one of the places where the abstraction
breaks down is, in fact, with Images - as I have to, as an end-user,
know what image format the cloud I'm using has decided to use. All of
the choices that deployers make around this are valid choices, but we
haven't done a good enough job in OpenStack to hide this from our users,
so they suffer.

The above thread about a user issuing boot from image and the backend
doing boot from volume I think should (and can) be more like cells and
less like image type. I'm confident that it can be done.

The ceph driver may be a place to look for inspiration. ceph has a
glance driver, and when you upload an image to glance, glance stores it
in ceph. BUT - when the nova driver decides to boot from an image when
the image is stored in ceph, it bypasses all of the normal image
download/caching code and essentially does a boot from volume behind the
scenes. (it's a zero-cost COW operation, so booting vms when the ceph
glance and nova drivers are used is very quick)

Now, those interactions do not result in a volume object that's visible
through the Cinder api. They're implementation details, so the user
don't have to know that they are booting from a COW volume in ceph.

I do not know enough details about how this is implemented, but I'd
imagine that if ceph was able to achieve what sound like the semantics
that are desired here without introducing API issues, it should be
totally possible to achieve them here to.

Thank you Chris for being much clearer and much more helpful in what you
said than what I did.

Monty


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
responded Sep 14, 2016 by Monty_Taylor (22,780 points)   2 4 7
0 votes

Thanks for Chirs and Monty's great conversation at the issue.

As you both pointed out, one of the problem is that we are lacking good
enough abstractions for the backends.

What we could do to solve this problem ? Should we consider to push a
defcore workaround to handle the short term problem, and discuss this with
the Glance team on Barcelona summit for the long term fix ?

On Wed, Sep 14, 2016 at 8:44 PM, Monty Taylor mordred@inaugust.com wrote:

On 09/14/2016 12:30 AM, Chris Hoge wrote:

I’ve been thinking quite a bit about your response Monty, and have
some observations and suggestions below.

On Sep 2, 2016, at 8:30 AM, Monty Taylor mordred@inaugust.com wrote:

On 09/02/2016 01:47 AM, Zhenyu Zheng wrote:

Hi All,

We are deploying Public Cloud platform based on OpensStack in EU, we
are
now working on DefCore certificate for our public cloud platform and we
meet some problems;

OpenStack(Nova) supports both "boot from Image" and "boot from Volume"
when launching instances; When we talk about large scale commercial
deployments such as Public Cloud, the reliability of the service is
been
considered as the key factor;

When we use "boot from Image" we can have two kinds of deployments: 1.
Nova-compute with no shared storage backend; 2. Nova-compute with
shared
storage backend. As for case 1, the system disk created from the image
will be created on the local disk of the host that nova-compute is on,
and the reliability of the userdata is considered low and it will be
very hard to manage this large amount of disks from different hosts all
over the deployment, thus it can be considered not commercially ready
for large scale deployments. As for case 2, the problem of reliability
and manage can be solved, but new problems are introduced - the
resource
usage and capacity amounts tracking being incorrect, this has been an
known issue[1] in Nova for a long time and the Nova team is trying to
solve the problem by introducing a new "resource provider" architecture
[2], this new architecture will need few releases to be fully
functional, thus case 2 is also considered to be not commercially
ready.

For the reasons I listed above, we have chosen to use "boot from
Volume"
to be the only way of booting instance in our Public Cloud, by doing
this, we can overcome the above mentioned cons and get other benefits
such as:

Resiliency - Cloud Block Storage is a persistent volume, users can
retain it after the server is deleted. Users can then use the volume to
create a new server.
Flexibility - User can have control over the size and type (SSD or
SATA)
of volume that used to boot the server. This control enables users to
fine-tune the storage to the needs of your operating system or
application.
Improvements in managing and recovering from server outages
Unified volume management

Only support "boot from Volume" brings us problems when pursuing the
DefCore certificate:

we have tests that trying to get instance list filtered by "image_id"
which is None for volume booted instances:

tempest.api.compute.servers.testcreateserver.
ServersTestJSON.testverifyserverdetails
tempest.api.compute.servers.test
createserver.
ServersTestManualDisk.test
verifyserverdetails
tempest.api.compute.servers.testlistserverfilters.
ListServerFiltersTestJSON.test
listserversdetailedfilterbyimage
tempest.api.compute.servers.test
listserverfilters.
ListServerFiltersTestJSON.testlistserversfilterby_image

  • The detailed information for instances booted from volumes does
    not contain informations about image_id, thus the test cases filter
    instance by image id cannot pass.

we also have tests like this:

tempest.api.compute.images.testimages.ImagesTestJSON.
test
deletesavingimage

  • This test tests creating an image for an instance, and delete the
    created instance snapshot during the image status of “saving”. As for
    instances booted from images, the snapshot status flow will be:
    queued->saving->active. But for instances booted from volumes, the
    action of instance snapshotting is actually an volume snapshot action
    done by cinder, the image saved in glance will only have the link to
    the
    created cinder volume snapshot, and the image status will be directly
    change to “active”, as the logic in this test will wait for the image
    status in glance change to “saving”, so it cannot pass for volume
    booted
    instances.

Also:

testattachvolume.AttachVolumeTestJSON.test_
listgetvolume_attachments

  • This test attaches one volume to an instance and then counts the
    number of attachments for that instance, the expected count was
    hardcoded to be 1. As for volume booted instances, the system disk is
    already an attachment, so the actual count of attachment will be 2, and
    the test fails.

And finally:

tempest.api.compute.servers.testserveractions.
ServerActionsTestJSON.testrebuildserver
tempest.api.compute.servers.testserversnegative.
ServersNegativeTestJSON.testrebuilddeletedserver
tempest.api.compute.servers.test
serversnegative.
ServersNegativeTestJSON.test
rebuildnonexistent_server

  • Rebuilding action is not supported when the instance is created
    via volume.

All those tests mentioned above are not friendly to "boot from Volume"
instances, we hope we can have some workarounds about the above
mentioned tests, as the problem that is having with "boot from Image"
is
really stopping us using it and it will also be good for DefCore if we
can figure out how to deal with this two types of instance creation.

References:
[1] Bugs related to resource usage reporting and calculation:

[2] BP about solving resource usage reporting and calculation with a
generic resource pool (resource provider):

https://git.openstack.org/cgit/openstack/nova-specs/
tree/specs/newton/approved/generic-resource-pools.rst

I can totally understand why would would value boot from volume. It has
a bunch of great features, as you mention.

However, running a cloud that disables boot from image is a niche choice
and I do not think that we should allow such a cloud to be considered
"normal". If I were to encounter such a cloud, based on the workloads I
currently run in 10 other public OpenStack clouds, I would consider it
broken - and none of my automation that has been built based on how
OpenStack clouds work consistently would work with that cloud.

If I understand their implementation correctly, the boot from volume
is an implementation detail of booting from what appears to be a
standard image from an API standpoint. We need to differentiate
between disallowing a set of APIs and implementing a different
backend for a particular API.

Yes, I whole heartedly agree!

I do think that we should do whatever we need to to push that boot from
volume is a regular, expected and consistent thing that people who are
using clouds can count on. I do not think that we should accept lack of
boot from image as a valid choice. It does not promote interoperability,
and it removes choice from the end user, which is a Bad Thing.

I think a better point of view is if a vendor chooses to use a different
backend, we should still expect the user-facing API to behave
predictably.
It seems that we’re facing a leaky abstraction more than we are
some decision to not conform to the expected API behavior.

I believe that we are in agreement but that you have stated it better
than I did.

It seems that some analysis has been done to determine that
boot-from-image is somehow not production ready or scalable.

To counter that, I would like to point out that the OpenStack Infra
team, using resources in Rackspace, OVH, Vexxhost, Internap, BlueBox,
the OpenStack Innovation Center, a private cloud run by the TripleO team
and a private cloud run by the Infra team boot 20k instance per day
using custom images. We upload those custom-made images using Glance
image upload daily. We have over 10 different custom images - each about
7.7G in size. While we DO have node-launch errors given the number we
launch each day:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=
16&fullscreen

it's a small number compared to the successful node launches:

http://grafana.openstack.org/dashboard/db/nodepool?panelId=
15&fullscreen

And we have tracked ZERO of the problems down to anything related to
images. (it's most frequently networking related)

We do have issues successfully uploading new images to the cloud - but
we also have rather large images since they contain a bunch of cached
data ... and the glance team is working on making the image upload
process more resilient and scalable.

In summary:

  • Please re-enable boot from image on your cloud if you care about
    interoperability and end users

-or- fix the unexpected behavior in the interoperable API to account
for the implementation details.

++

  • Please do not think that after having disabled one of the most common
    and fundamental features of the cloud that the group responsible for
    ensuring cloud interoperability should change anything to allow your
    divergent cloud to be considered interoperable. It is not. It needs to
    be fixed.

I don’t disagree, but I also think it’s important to work with the
issues that
vendors who deploy OpenStack are facing and try to understand how
they fit into the larger ecosystem. Part of what we’re trying to
accomplish
here is build a virtuous circle between upstream developers, downstream
deployers, and users.

I absolutely agree. Again, you said words better than I did. If/when
there is an issue a deployer has, it's important to fix it.

If the tests we have right now are only testing boot-from-image as an
implementation happenstance, we should immediately add tests that
EXPLICITLY test for boot-from-image. If we cannot count on that basic
functionality, the we truly will have given up on the entire idea of
interoperable clouds.

We can count on the basic black-box functionality at some level. It’s
the interaction of the implementation with the rest of the API that’s
causing problems.

The ‘create from image’ call itself does part of what it’s advertised to
do,
it boots an image. And at first pass (the tests for the actual launching
of the image) all looks well. The issue comes later when the user
queries the ‘read vms that are images’ api, and it’s clear the
abstraction
loop hasn’t been closed. This is exactly what the interoperability
tests are meant to catch, and indicates what needs to be fixed.

Yup

How it’s fixed is a different story, and I think that we as a community
need to be careful about prescribing one solution over another.

Totally.

I should be clear about my POV on this, just for the record.

I by and large speak from the perspective of someone who consumes
OpenStack APIs across a lot of clouds. One of the things I think is
great about OpenStack is that in theory I should never need to know the
implementation details that someone has chosen to make.

A good example of this is cells v1. Rackspace is the only public cloud
I'm aware of that runs cells v1 ... but I do not know this as a consumer
of the API. It's completely transparent to me, even though it's an
implementation choice Rackspace made to deal with scaling issues. It's a
place where our providers have been able to make the choices that make
sense and our end-users don't suffer because of it. This is good!

Very related to this thread, one of the places where the abstraction
breaks down is, in fact, with Images - as I have to, as an end-user,
know what image format the cloud I'm using has decided to use. All of
the choices that deployers make around this are valid choices, but we
haven't done a good enough job in OpenStack to hide this from our users,
so they suffer.

The above thread about a user issuing boot from image and the backend
doing boot from volume I think should (and can) be more like cells and
less like image type. I'm confident that it can be done.

The ceph driver may be a place to look for inspiration. ceph has a
glance driver, and when you upload an image to glance, glance stores it
in ceph. BUT - when the nova driver decides to boot from an image when
the image is stored in ceph, it bypasses all of the normal image
download/caching code and essentially does a boot from volume behind the
scenes. (it's a zero-cost COW operation, so booting vms when the ceph
glance and nova drivers are used is very quick)

Now, those interactions do not result in a volume object that's visible
through the Cinder api. They're implementation details, so the user
don't have to know that they are booting from a COW volume in ceph.

I do not know enough details about how this is implemented, but I'd
imagine that if ceph was able to achieve what sound like the semantics
that are desired here without introducing API issues, it should be
totally possible to achieve them here to.

Thank you Chris for being much clearer and much more helpful in what you
said than what I did.

Monty


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee

--
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Prooduct Line
Huawei Technologies Co,. Ltd
Email: huangzhipeng@huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipengh@uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado


Defcore-committee mailing list
Defcore-committee@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/defcore-committee
responded Sep 15, 2016 by Zhipeng_Huang (6,720 points)   2 3 3
...