settingsLogin | Registersettings

[openstack-dev] [TripleO] test environment requirements

0 votes

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

-Rob

--
Robert Collins
Distinguished Technologist
HP Converged Cloud

asked Mar 13, 2014 in openstack-dev by Robert_Collins (27,200 points)   4 6 12
retagged Mar 19, 2015 by admin

18 Responses

0 votes

From: Robert Collins [mailto:robertc at robertcollins.net]
Sent: 13 March 2014 09:52
Subject: [openstack-dev] [TripleO] test environment requirements

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling brbm
across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad firewall
rules on the control node preventing network tunnelling etc... e.g. we
benefit the more things are split out like scale deployments are. OTOH
testing the micro-cloud that folk may start with is also a really good
idea....

The script should be able to determine the local memory available, can we not create 2-3 configurations and choose the optimum for the developers workstation? This keeps a low bar for entry, but allows for more complete testing by those that have the resources.

Any of the compromises reduces the effectiveness of the testing, so where no compromise is necessary it would be good to see none.

-Rob

--
Robert Collins
Distinguished Technologist
HP Converged Cloud


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Thanks,
Jon-Paul Sullivan ? Cloud Services - @hpcloud

Postal Address: Hewlett-Packard Galway Limited, Ballybrit Business Park, Galway.
Registered Office: Hewlett-Packard Galway Limited, 63-74 Sir John Rogerson's Quay, Dublin 2.
Registered Number: 361933

The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender.

To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".

responded Mar 13, 2014 by Sullivan,_Jon_Paul (1,140 points)   1 2
0 votes

On Thu, Mar 13, 2014 at 2:51 AM, Robert Collins
wrote:
So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

The idea I was thinking was to make a testenv host available to
tripleo atc's. Or, perhaps make it a bit more locked down and only
available to a new group of tripleo folk, existing somewhere between
the privileges of tripleo atc's and tripleo-cd-admins. We could
document how you use the cloud (Red Hat's or HP's) rack to start up a
instance to run devtest on one of the compute hosts, request and lock
yourself a testenv environment on one of the testenv hosts, etc.
Basically, how our CI works. Although I think we'd want different
testenv hosts for development vs what runs the CI, and would need to
make sure everything was locked down appropriately security-wise.

Some other ideas:

  • Allow an option to get rid of the seed VM, or make it so that you
    can shut it down after the Undercloud is up. This only really gets rid
    of 1 VM though, so it doesn't buy you much nor solve any long term
    problem.

  • Make it easier to see how you'd use virsh against any libvirt host
    you might have lying around. We already have the setting exposed, but
    make it a bit more public and call it out more in the docs. I've
    actually never tried it myself, but have been meaning to.

  • I'm really reaching now, and this may be entirely unrealistic :),
    but....somehow use the fake baremetal driver and expose a mechanism to
    let the developer specify the already setup undercloud/overcloud
    environment ahead of time.
    For example:

  • Build your undercloud images with the vm element since you won't be
    PXE booting it
  • Upload your images to a public cloud, and boot instances for them.
  • Use this new mechanism when you run devtest (presumably running from
    another instance in the same cloud) to say "I'm using the fake
    baremetal driver, and here are the IP's of the undercloud instances".
  • Repeat steps for the overcloud (e.g., configure undercloud to use
    fake baremetal driver, etc).
  • Maybe it's not the fake baremetal driver, and instead a new driver
    that is a noop for the pxe stuff, and the power_on implementation
    powers on the cloud instances.
  • Obviously if your aim is to test the pxe and disk deploy process
    itself, this wouldn't work for you.
  • Presumably said public cloud is OpenStack, so we've also achieved
    another layer of "On OpenStack".

--
-- James Slagle
--

responded Mar 13, 2014 by James_Slagle (7,000 points)   1 3 3
0 votes

On 2014-03-13 11:12, James Slagle wrote:
On Thu, Mar 13, 2014 at 2:51 AM, Robert Collins
wrote:

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

The idea I was thinking was to make a testenv host available to
tripleo atc's. Or, perhaps make it a bit more locked down and only
available to a new group of tripleo folk, existing somewhere between
the privileges of tripleo atc's and tripleo-cd-admins. We could
document how you use the cloud (Red Hat's or HP's) rack to start up a
instance to run devtest on one of the compute hosts, request and lock
yourself a testenv environment on one of the testenv hosts, etc.
Basically, how our CI works. Although I think we'd want different
testenv hosts for development vs what runs the CI, and would need to
make sure everything was locked down appropriately security-wise.

Some other ideas:

  • Allow an option to get rid of the seed VM, or make it so that you
    can shut it down after the Undercloud is up. This only really gets rid
    of 1 VM though, so it doesn't buy you much nor solve any long term
    problem.

  • Make it easier to see how you'd use virsh against any libvirt host
    you might have lying around. We already have the setting exposed, but
    make it a bit more public and call it out more in the docs. I've
    actually never tried it myself, but have been meaning to.

  • I'm really reaching now, and this may be entirely unrealistic :),
    but....somehow use the fake baremetal driver and expose a mechanism to
    let the developer specify the already setup undercloud/overcloud
    environment ahead of time.
    For example:

  • Build your undercloud images with the vm element since you won't be
    PXE booting it
  • Upload your images to a public cloud, and boot instances for them.
  • Use this new mechanism when you run devtest (presumably running from
    another instance in the same cloud) to say "I'm using the fake
    baremetal driver, and here are the IP's of the undercloud instances".
  • Repeat steps for the overcloud (e.g., configure undercloud to use
    fake baremetal driver, etc).
  • Maybe it's not the fake baremetal driver, and instead a new driver
    that is a noop for the pxe stuff, and the power_on implementation
    powers on the cloud instances.
  • Obviously if your aim is to test the pxe and disk deploy process
    itself, this wouldn't work for you.
  • Presumably said public cloud is OpenStack, so we've also achieved
    another layer of "On OpenStack".

I actually spent quite a while looking into something like this last
option when I first started on TripleO, because I had only one big
server locally and it was running my OpenStack installation. I was
hoping to use it for my TripleO instances, and even went so far as to
add support for OpenStack to the virtual power driver in baremetal. I
was never completely successful, but I did work through a number of
problems:

  1. Neutron didn't like allowing the DHCP/PXE traffic to let my seed
    serve to the undercloud. I was able to get around this by using flat
    networking with a local bridge on the OpenStack system, but I'm not sure
    if that's going to be possible on most public cloud providers. There
    may very well be a less invasive way to configure Neutron to allow that,
    but I don't know how to do it.

  2. Last time I checked, Nova doesn't support PXE booting instances so I
    had to use iPXE images to do the booting. This doesn't work since we
    PXE boot every time an instance reboots and the iPXE image gets
    overwritten by the image deploy, so the instance doesn't boot properly
    after deployment. This is where I stopped my investigation because I
    didn't want to start hacking up my OpenStack installation to get around
    the problem, but if we decided to go in this direction I don't think it
    would be terribly difficult to get support for this into Nova. I know
    there have been proposed patches for it before, but I don't think there
    was ever much push behind them because it isn't something most people
    need to do.

So if we can work through a couple of problems then in theory it should
be possible to use OpenStack instances for TripleO development, which
would let us do the cloudy thing and have someone else worry about the
hardware. The idea certainly has some appeal to me.

-Ben

responded Mar 14, 2014 by Ben_Nemec (19,660 points)   2 3 3
0 votes

On 13/03/14 09:51, Robert Collins wrote:
So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

I'd vote for an optional (non default) inline cloud setup like what you
mention above, and maybe a non HA setup aswell. This would allow a lower
entry bar to people who only want to worry about a specific component.
We then would need to cover all supported setups in CI (adding to
capacity needs). and of course we then wouldn't have everybody
exercising HA but it may be necessary to encourage uptake.

-Rob

responded Mar 21, 2014 by Derek_Higgins (5,340 points)   1 3 3
0 votes

On 13/03/14 16:12, James Slagle wrote:
On Thu, Mar 13, 2014 at 2:51 AM, Robert Collins
wrote:

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

The idea I was thinking was to make a testenv host available to
tripleo atc's. Or, perhaps make it a bit more locked down and only
available to a new group of tripleo folk, existing somewhere between
the privileges of tripleo atc's and tripleo-cd-admins. We could
document how you use the cloud (Red Hat's or HP's) rack to start up a
instance to run devtest on one of the compute hosts, request and lock
yourself a testenv environment on one of the testenv hosts, etc.
Basically, how our CI works. Although I think we'd want different
testenv hosts for development vs what runs the CI, and would need to
make sure everything was locked down appropriately security-wise.

I like this idea, I think it could work, my only concern is the extra
capacity we would need to pull it off. At the moment we are probably
falling short on capacity to do what we want for CI so adding to this
would make the situation worse (how much worse I don't know). So unless
we get to the point where we have spare hardware doing nothing I think
its a non runner.

Some other ideas:

  • Allow an option to get rid of the seed VM, or make it so that you
    can shut it down after the Undercloud is up. This only really gets rid
    of 1 VM though, so it doesn't buy you much nor solve any long term
    problem.

  • Make it easier to see how you'd use virsh against any libvirt host
    you might have lying around. We already have the setting exposed, but
    make it a bit more public and call it out more in the docs. I've
    actually never tried it myself, but have been meaning to.

this could work as an option

  • I'm really reaching now, and this may be entirely unrealistic :),
    but....somehow use the fake baremetal driver and expose a mechanism to
    let the developer specify the already setup undercloud/overcloud
    environment ahead of time.
    For example:
  • Build your undercloud images with the vm element since you won't be
    PXE booting it
  • Upload your images to a public cloud, and boot instances for them.
  • Use this new mechanism when you run devtest (presumably running from
    another instance in the same cloud) to say "I'm using the fake
    baremetal driver, and here are the IP's of the undercloud instances".
  • Repeat steps for the overcloud (e.g., configure undercloud to use
    fake baremetal driver, etc).
  • Maybe it's not the fake baremetal driver, and instead a new driver
    that is a noop for the pxe stuff, and the power_on implementation
    powers on the cloud instances.
  • Obviously if your aim is to test the pxe and disk deploy process
    itself, this wouldn't work for you.
  • Presumably said public cloud is OpenStack, so we've also achieved
    another layer of "On OpenStack".
responded Mar 21, 2014 by Derek_Higgins (5,340 points)   1 3 3
0 votes

On 14/03/14 20:16, Ben Nemec wrote:
On 2014-03-13 11:12, James Slagle wrote:

On Thu, Mar 13, 2014 at 2:51 AM, Robert Collins
wrote:

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

The idea I was thinking was to make a testenv host available to
tripleo atc's. Or, perhaps make it a bit more locked down and only
available to a new group of tripleo folk, existing somewhere between
the privileges of tripleo atc's and tripleo-cd-admins. We could
document how you use the cloud (Red Hat's or HP's) rack to start up a
instance to run devtest on one of the compute hosts, request and lock
yourself a testenv environment on one of the testenv hosts, etc.
Basically, how our CI works. Although I think we'd want different
testenv hosts for development vs what runs the CI, and would need to
make sure everything was locked down appropriately security-wise.

Some other ideas:

  • Allow an option to get rid of the seed VM, or make it so that you
    can shut it down after the Undercloud is up. This only really gets rid
    of 1 VM though, so it doesn't buy you much nor solve any long term
    problem.

  • Make it easier to see how you'd use virsh against any libvirt host
    you might have lying around. We already have the setting exposed, but
    make it a bit more public and call it out more in the docs. I've
    actually never tried it myself, but have been meaning to.

  • I'm really reaching now, and this may be entirely unrealistic :),
    but....somehow use the fake baremetal driver and expose a mechanism to
    let the developer specify the already setup undercloud/overcloud
    environment ahead of time.
    For example:

  • Build your undercloud images with the vm element since you won't be
    PXE booting it
  • Upload your images to a public cloud, and boot instances for them.
  • Use this new mechanism when you run devtest (presumably running from
    another instance in the same cloud) to say "I'm using the fake
    baremetal driver, and here are the IP's of the undercloud instances".
  • Repeat steps for the overcloud (e.g., configure undercloud to use
    fake baremetal driver, etc).
  • Maybe it's not the fake baremetal driver, and instead a new driver
    that is a noop for the pxe stuff, and the power_on implementation
    powers on the cloud instances.
  • Obviously if your aim is to test the pxe and disk deploy process
    itself, this wouldn't work for you.
  • Presumably said public cloud is OpenStack, so we've also achieved
    another layer of "On OpenStack".

I actually spent quite a while looking into something like this last
option when I first started on TripleO, because I had only one big
server locally and it was running my OpenStack installation. I was
hoping to use it for my TripleO instances, and even went so far as to
add support for OpenStack to the virtual power driver in baremetal. I
was never completely successful, but I did work through a number of
problems:

  1. Neutron didn't like allowing the DHCP/PXE traffic to let my seed
    serve to the undercloud. I was able to get around this by using flat
    networking with a local bridge on the OpenStack system, but I'm not sure
    if that's going to be possible on most public cloud providers. There
    may very well be a less invasive way to configure Neutron to allow that,
    but I don't know how to do it.

  2. Last time I checked, Nova doesn't support PXE booting instances so I
    had to use iPXE images to do the booting. This doesn't work since we
    PXE boot every time an instance reboots and the iPXE image gets
    overwritten by the image deploy, so the instance doesn't boot properly
    after deployment. This is where I stopped my investigation because I
    didn't want to start hacking up my OpenStack installation to get around
    the problem, but if we decided to go in this direction I don't think it
    would be terribly difficult to get support for this into Nova. I know
    there have been proposed patches for it before, but I don't think there
    was ever much push behind them because it isn't something most people
    need to do.

So if we can work through a couple of problems then in theory it should
be possible to use OpenStack instances for TripleO development, which
would let us do the cloudy thing and have someone else worry about the
hardware. The idea certainly has some appeal to me.

I'm tempted to say If we could pull this off it would be great but I'm
worried it would differ too much from our target deployment method. We
would be spreading ourselves too thin trying to support this for
developers along with our traditional deployment method. Also if using
it is adopted by too many people the only thing exercising our target
deployment method would be CI. But I'm interested in what other people
think.

-Ben


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 21, 2014 by Derek_Higgins (5,340 points)   1 3 3
0 votes

----- Original Message -----
From: "Robert Collins"
To: "OpenStack Development Mailing List"
Sent: Thursday, March 13, 2014 5:51:30 AM
Subject: [openstack-dev] [TripleO] test environment requirements

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc...

I'm all for supporting HA development and testing within devtest. I'm against forcing it on all users as a default.

I can imaging wanting to cut corners and have configurations flexible on both ends (undercloud and overcloud). I may for example deploy a single all-in-one undercloud when I'm testing overcloud HA. Or vice versa.

I think I'm one of the few (if not the only) developer who uses almost exclusive baremetal (besides seed VM) when test/developing TripleO. Forcing users who want to do this to have 6-7 real machines is a bit much I think. Arguably wasteful even. By requiring more machines to run through devtest you actually make it harder for people to test it on real hardware which is usually harder to come by. Given deployment on real bare metal is sort of the point or TripleO I'd very much like to see more developers using it rather than less.

So by all means lets support HA... but lets do it in a way that is configurable (i.e. not forcing people to be wasters)

Dan

e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

-Rob

--
Robert Collins
Distinguished Technologist
HP Converged Cloud


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Mar 21, 2014 by Dan_Prince (8,160 points)   1 5 7
0 votes

On 2014-03-21 10:57, Derek Higgins wrote:
On 14/03/14 20:16, Ben Nemec wrote:

On 2014-03-13 11:12, James Slagle wrote:

On Thu, Mar 13, 2014 at 2:51 AM, Robert Collins
wrote:

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

The idea I was thinking was to make a testenv host available to
tripleo atc's. Or, perhaps make it a bit more locked down and only
available to a new group of tripleo folk, existing somewhere between
the privileges of tripleo atc's and tripleo-cd-admins. We could
document how you use the cloud (Red Hat's or HP's) rack to start up a
instance to run devtest on one of the compute hosts, request and lock
yourself a testenv environment on one of the testenv hosts, etc.
Basically, how our CI works. Although I think we'd want different
testenv hosts for development vs what runs the CI, and would need to
make sure everything was locked down appropriately security-wise.

Some other ideas:

  • Allow an option to get rid of the seed VM, or make it so that you
    can shut it down after the Undercloud is up. This only really gets rid
    of 1 VM though, so it doesn't buy you much nor solve any long term
    problem.

  • Make it easier to see how you'd use virsh against any libvirt host
    you might have lying around. We already have the setting exposed, but
    make it a bit more public and call it out more in the docs. I've
    actually never tried it myself, but have been meaning to.

  • I'm really reaching now, and this may be entirely unrealistic :),
    but....somehow use the fake baremetal driver and expose a mechanism to
    let the developer specify the already setup undercloud/overcloud
    environment ahead of time.
    For example:

  • Build your undercloud images with the vm element since you won't be
    PXE booting it
  • Upload your images to a public cloud, and boot instances for them.
  • Use this new mechanism when you run devtest (presumably running from
    another instance in the same cloud) to say "I'm using the fake
    baremetal driver, and here are the IP's of the undercloud instances".
  • Repeat steps for the overcloud (e.g., configure undercloud to use
    fake baremetal driver, etc).
  • Maybe it's not the fake baremetal driver, and instead a new driver
    that is a noop for the pxe stuff, and the power_on implementation
    powers on the cloud instances.
  • Obviously if your aim is to test the pxe and disk deploy process
    itself, this wouldn't work for you.
  • Presumably said public cloud is OpenStack, so we've also achieved
    another layer of "On OpenStack".

I actually spent quite a while looking into something like this last
option when I first started on TripleO, because I had only one big
server locally and it was running my OpenStack installation. I was
hoping to use it for my TripleO instances, and even went so far as to
add support for OpenStack to the virtual power driver in baremetal. I
was never completely successful, but I did work through a number of
problems:

  1. Neutron didn't like allowing the DHCP/PXE traffic to let my seed
    serve to the undercloud. I was able to get around this by using flat
    networking with a local bridge on the OpenStack system, but I'm not sure
    if that's going to be possible on most public cloud providers. There
    may very well be a less invasive way to configure Neutron to allow that,
    but I don't know how to do it.

  2. Last time I checked, Nova doesn't support PXE booting instances so I
    had to use iPXE images to do the booting. This doesn't work since we
    PXE boot every time an instance reboots and the iPXE image gets
    overwritten by the image deploy, so the instance doesn't boot properly
    after deployment. This is where I stopped my investigation because I
    didn't want to start hacking up my OpenStack installation to get around
    the problem, but if we decided to go in this direction I don't think it
    would be terribly difficult to get support for this into Nova. I know
    there have been proposed patches for it before, but I don't think there
    was ever much push behind them because it isn't something most people
    need to do.

So if we can work through a couple of problems then in theory it should
be possible to use OpenStack instances for TripleO development, which
would let us do the cloudy thing and have someone else worry about the
hardware. The idea certainly has some appeal to me.

I'm tempted to say If we could pull this off it would be great but I'm
worried it would differ too much from our target deployment method. We
would be spreading ourselves too thin trying to support this for
developers along with our traditional deployment method. Also if using
it is adopted by too many people the only thing exercising our target
deployment method would be CI. But I'm interested in what other people
think.

To clarify my suggestion, I want to take James's idea a step further and
do the full PXE deploy to Nova instances. So the only difference from
our workflow now would be that instead of configuring your VM's with
virsh, you would do it with Nova and Neutron.

-Ben

responded Mar 21, 2014 by Ben_Nemec (19,660 points)   2 3 3
0 votes

Excerpts from Ben Nemec's message of 2014-03-21 09:38:00 -0700:

On 2014-03-21 10:57, Derek Higgins wrote:

On 14/03/14 20:16, Ben Nemec wrote:

On 2014-03-13 11:12, James Slagle wrote:

On Thu, Mar 13, 2014 at 2:51 AM, Robert Collins
wrote:

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc... e.g. we benefit the more things are split out like scale
deployments are. OTOH testing the micro-cloud that folk may start with
is also a really good idea....

The idea I was thinking was to make a testenv host available to
tripleo atc's. Or, perhaps make it a bit more locked down and only
available to a new group of tripleo folk, existing somewhere between
the privileges of tripleo atc's and tripleo-cd-admins. We could
document how you use the cloud (Red Hat's or HP's) rack to start up a
instance to run devtest on one of the compute hosts, request and lock
yourself a testenv environment on one of the testenv hosts, etc.
Basically, how our CI works. Although I think we'd want different
testenv hosts for development vs what runs the CI, and would need to
make sure everything was locked down appropriately security-wise.

Some other ideas:

  • Allow an option to get rid of the seed VM, or make it so that you
    can shut it down after the Undercloud is up. This only really gets rid
    of 1 VM though, so it doesn't buy you much nor solve any long term
    problem.

  • Make it easier to see how you'd use virsh against any libvirt host
    you might have lying around. We already have the setting exposed, but
    make it a bit more public and call it out more in the docs. I've
    actually never tried it myself, but have been meaning to.

  • I'm really reaching now, and this may be entirely unrealistic :),
    but....somehow use the fake baremetal driver and expose a mechanism to
    let the developer specify the already setup undercloud/overcloud
    environment ahead of time.
    For example:

  • Build your undercloud images with the vm element since you won't be
    PXE booting it
  • Upload your images to a public cloud, and boot instances for them.
  • Use this new mechanism when you run devtest (presumably running from
    another instance in the same cloud) to say "I'm using the fake
    baremetal driver, and here are the IP's of the undercloud instances".
  • Repeat steps for the overcloud (e.g., configure undercloud to use
    fake baremetal driver, etc).
  • Maybe it's not the fake baremetal driver, and instead a new driver
    that is a noop for the pxe stuff, and the power_on implementation
    powers on the cloud instances.
  • Obviously if your aim is to test the pxe and disk deploy process
    itself, this wouldn't work for you.
  • Presumably said public cloud is OpenStack, so we've also achieved
    another layer of "On OpenStack".

I actually spent quite a while looking into something like this last
option when I first started on TripleO, because I had only one big
server locally and it was running my OpenStack installation. I was
hoping to use it for my TripleO instances, and even went so far as to
add support for OpenStack to the virtual power driver in baremetal. I
was never completely successful, but I did work through a number of
problems:

  1. Neutron didn't like allowing the DHCP/PXE traffic to let my seed
    serve to the undercloud. I was able to get around this by using flat
    networking with a local bridge on the OpenStack system, but I'm not sure
    if that's going to be possible on most public cloud providers. There
    may very well be a less invasive way to configure Neutron to allow that,
    but I don't know how to do it.

  2. Last time I checked, Nova doesn't support PXE booting instances so I
    had to use iPXE images to do the booting. This doesn't work since we
    PXE boot every time an instance reboots and the iPXE image gets
    overwritten by the image deploy, so the instance doesn't boot properly
    after deployment. This is where I stopped my investigation because I
    didn't want to start hacking up my OpenStack installation to get around
    the problem, but if we decided to go in this direction I don't think it
    would be terribly difficult to get support for this into Nova. I know
    there have been proposed patches for it before, but I don't think there
    was ever much push behind them because it isn't something most people
    need to do.

So if we can work through a couple of problems then in theory it should
be possible to use OpenStack instances for TripleO development, which
would let us do the cloudy thing and have someone else worry about the
hardware. The idea certainly has some appeal to me.

I'm tempted to say If we could pull this off it would be great but I'm
worried it would differ too much from our target deployment method. We
would be spreading ourselves too thin trying to support this for
developers along with our traditional deployment method. Also if using
it is adopted by too many people the only thing exercising our target
deployment method would be CI. But I'm interested in what other people
think.

To clarify my suggestion, I want to take James's idea a step further and
do the full PXE deploy to Nova instances. So the only difference from
our workflow now would be that instead of configuring your VM's with
virsh, you would do it with Nova and Neutron.

Seems like this would be the most scalable long term plan. Would it be
as easy as enabling PXE bios for instances and allowing users to set
the appropriate DHCP options for the tftp server on these ports?

responded Mar 24, 2014 by Clint_Byrum (40,940 points)   4 5 9
0 votes

Excerpts from Dan Prince's message of 2014-03-21 09:25:42 -0700:

----- Original Message -----

From: "Robert Collins"
To: "OpenStack Development Mailing List"
Sent: Thursday, March 13, 2014 5:51:30 AM
Subject: [openstack-dev] [TripleO] test environment requirements

So we already have pretty high requirements - its basically a 16G
workstation as minimum.

Specifically to test the full story:
- a seed VM
- an undercloud VM (bm deploy infra)
- 1 overcloud control VM
- 2 overcloud hypervisor VMs
====
5 VMs with 2+G RAM each.

To test the overcloud alone against the seed we save 1 VM, to skip the
overcloud we save 3.

However, as HA matures we're about to add 4 more VMs: we need a HA
control plane for both the under and overclouds:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud control VMs (HA)
- 2 overcloud hypervisor VMs
====
9 VMs with 2+G RAM each == 18GB

What should we do about this?

A few thoughts to kick start discussion:
- use Ironic to test across multiple machines (involves tunnelling
brbm across machines, fairly easy)
- shrink the VM sizes (causes thrashing)
- tell folk to toughen up and get bigger machines (ahahahahaha, no)
- make the default configuration inline the hypervisors on the
overcloud with the control plane:
- a seed VM
- 3 undercloud VMs (HA bm deploy infra)
- 3 overcloud all-in-one VMs (HA)
====
7 VMs with 2+G RAM each == 14GB

I think its important that we exercise features like HA and live
migration regularly by developers, so I'm quite keen to have a fairly
solid systematic answer that will let us catch things like bad
firewall rules on the control node preventing network tunnelling
etc...

I'm all for supporting HA development and testing within devtest. I'm against forcing it on all users as a default.

I can imaging wanting to cut corners and have configurations flexible on both ends (undercloud and overcloud). I may for example deploy a single all-in-one undercloud when I'm testing overcloud HA. Or vice versa.

I think I'm one of the few (if not the only) developer who uses almost exclusive baremetal (besides seed VM) when test/developing TripleO. Forcing users who want to do this to have 6-7 real machines is a bit much I think. Arguably wasteful even. By requiring more machines to run through devtest you actually make it harder for people to test it on real hardware which is usually harder to come by. Given deployment on real bare metal is sort of the point or TripleO I'd very much like to see more developers using it rather than less.

So by all means lets support HA... but lets do it in a way that is configurable (i.e. not forcing people to be wasters)

Dan

I don't think anybody wants to force it on users. But a predominance
of end users will require HA, and thus we need our developers to be able
to develop with HA.

This is for the benefit of developers. I imagine we've all been in
situations where our dev environment is almost nothing like CI, and then
when CI runs you find that you have missed huge problems.. and now to
test for those problems you either have to re-do your dev environment,
or wait.. a lot.. for CI.

I don't have any really clever answers to this problem. We're testing an
end-to-end cloud deployment. If we can't run a small, accurate simulation
of such an environment as developers, then we will end up going very slow.
The problem is that this small simulation is still massive compared
to the usual development paradigm which involves at most two distinct
virtual machines.

responded Mar 24, 2014 by Clint_Byrum (40,940 points)   4 5 9
...