settingsLogin | Registersettings

[openstack-dev] [tc][nova][ironic][mogan] Evaluate Mogan project

0 votes

Hi folks,

First of all, thanks for the audiences for Mogan project update in the TC
room during Denver PTG. Here we would like to get more suggestions before
we apply for inclusion.

Speaking only for myself, I find the current direction of one API+scheduler
for vm/baremetal/container unfortunate. After containers management moved
out to be a separated project Zun, baremetal with Nova and Ironic continues
to be a pain point.

. API

Only part of the Nova APIs and parameters can apply to baremetal instances,
meanwhile for interoperable with other virtual drivers, bare metal specific
APIs such as deploy time RAID, advanced partitions can not be included.
It's true that we can support various compute drivers, but the reality is
that the support of each of hypervisor is not equal, especially for bare
metals in a virtualization world. But I understand the problems with that
as Nova was designed to provide compute resources(virtual machines) instead
of bare metals.

. Scheduler

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource, as
nova-compute processes can't be run on the inventory nodes themselves. That
is to say host aggregates, availability zones and such things based on
compute service(host) can't be applied to bare metal resources. And for
grouping like anti-affinity, the granularity is also not same with virtual
machines, bare metal users may want their HA instances not on the same
failure domain instead of the node itself. Short saying, we can only get a
rigid resource class only scheduling for bare metals.

And most of the cloud providers in the market offering virtual machines and
bare metals as separated resources, but unfortunately, it's hard to achieve
this with one compute service. I heard people are deploying seperated Nova
for virtual machines and bare metals with many downstream hacks to the bare
metal single-driver Nova but as the changes to Nova would be massive and
may invasive to virtual machines, it seems not practical to be upstream.

So we created Mogan [1] about one year ago, which aims to offer bare metals
as first class resources to users with a set of bare metal specific API and
a baremetal-centric scheduler(with Placement service). It was like an
experimental project at the beginning, but the outcome makes us believe
it's the right way. Mogan will fully embrace Ironic for bare metal
provisioning and with RSD server [2] introduced to OpenStack, it will be a
new world for bare metals, as with that we can compose hardware resources
on the fly.

Also, I would like to clarify the overlaps between Mogan and Nova, I bet
there must be some users who wants to use one API for the compute resources
management as they don't care about whether it's a virtual machine or a
bare metal server. Baremetal driver with Nova is still the right choice for
such users to get raw performance compute resources. On the contrary, Mogan
is for real bare metal users and cloud providers who wants to offer bare
metals as a separated resources.

Thank you for your time!

[1] https://wiki.openstack.org/wiki/Mogan
[2]
https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html

--
Best Regards,
Zhenguo Niu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Sep 28, 2017 in openstack-dev by Zhenguo_Niu (2,400 points)   1 4

13 Responses

0 votes

Hi!

Thanks for raising this. I was interested in the project for some time, but I
never got a chance to wrap my head around. I also have a few concerns - please
see inline.

On 09/25/2017 01:27 PM, Zhenguo Niu wrote:
Hi folks,

First of all, thanks for the audiences for Mogan project update in the TC room
during Denver PTG. Here we would like to get more suggestions before we apply
for inclusion.

Speaking only for myself, I find the current direction of one API+scheduler for
vm/baremetal/container unfortunate. After containers management moved out to be
a separated project Zun, baremetal with Nova and Ironic continues to be a pain
point.

. API

Only part of the Nova APIs and parameters can apply to baremetal instances,
meanwhile for interoperable with other virtual drivers, bare metal specific APIs
such as deploy time RAID, advanced partitions can not  be included. It's true
that we can support various compute drivers, but the reality is that the support
of each of hypervisor is not equal, especially for bare metals in a
virtualization world. But I understand the problems with that as Nova was
designed to provide compute resources(virtual machines) instead of bare metals.

A correction: any compute resources.

Nova works okay with bare metals. It's never going to work perfectly though,
because we always have to find a common subset of features between VM and BM.
RAID is a good example indeed. We have a solution for the future, but it's not
going to satisfy everyone.

Now I have a question: to which extend do you plan to maintain the "cloud"
nature of the API? Let's take RAID as an example. Ironic can apply a very
generic or a very specific configuration. You can request "just RAID-5" or you
can ask for specific disks to be combined in a specific combination. I believe
the latter is not something we want to expose to cloud users, as it's not going
to be a cloud any more.

. Scheduler

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource, as
nova-compute processes can't be run on the inventory nodes themselves. That is
to say host aggregates, availability zones and such things based on compute
service(host) can't be applied to bare metal resources. And for grouping like
anti-affinity, the granularity is also not same with virtual machines, bare
metal users may want their HA instances not on the same failure domain instead
of the node itself. Short saying, we can only get a rigid resource class only
scheduling for bare metals.

It's not rigid. Okay, it's rigid, but it's not as rigid as what we used to have.

If you're going back to VCPUs-memory-disk triad, you're making it more rigid. Of
these three, only memory has ever made practical sense for deployers. VCPUs is a
bit subtle, as it depends on hyper-threading enabled/disabled, and I've never
seen people using it too often.

But our localgb thing is an outright lie. Of 20 disks a machine can easily
have, which one do you report for local
gb? Well, in the best case people used
ironic root device hints with ironic-inspector to figure out. Which is great,
but requires ironic-inspector. In the worst case people just put random number
there to make scheduling work. This is horrible, please make sure to not get
back to it.

What I would love to see of a bare metal scheduling project is a scheduling
based on inventory. I was thinking of being able to express things like "give me
a node with 2 GPU of at least 256 CUDA cores each". Do you plan on this kind of
things? This would truly mean flexible scheduling.

Which brings me to one of my biggest reservations about Mogan: I don't think
copying Nova's architecture is a good idea overall. Particularly, I think you
have flavors, which do not map at all into bare metal world IMO.

And most of the cloud providers in the market offering virtual machines and bare
metals as separated resources, but unfortunately, it's hard to achieve this with
one compute service.

Do you have proofs for the first statement? And do you imply public clouds? Our
customers deploy hybrid environments, to my best knowledge. Nobody I know uses
one compute service in the whole cloud anyway.

I heard people are deploying seperated Nova for virtual
machines and bare metals with many downstream hacks to the bare metal
single-driver Nova but as the changes to Nova would be massive and may invasive
to virtual machines, it seems not practical to be upstream.

I think you're overestimated the problem. In TripleO we deploy separate virtual
nova compute nodes. If ironic is enabled, its nova computes go to controllers.
Then you can use host aggregates to split flavors between VM and BM. With
resources classes it's even more trivial: you get this split naturally.

So we created Mogan [1] about one year ago, which aims to offer bare metals as
first class resources to users with a set of bare metal specific API and a
baremetal-centric scheduler(with Placement service). It was like an experimental
project at the beginning, but the outcome makes us believe it's the right way.
Mogan will fully embrace Ironic for bare metal provisioning and with RSD server
[2] introduced to OpenStack, it will be a new world for bare metals, as with
that we can compose hardware resources on the fly.

Good that you touched this topic, because I have a question here :)

With ironic you request a node. With RSD and similar you create a node,
which is closer to VMs than to traditional BMs. This gives a similar problem to
what we have with nova now. Namely, exact vs non-exact filters. How do you solve
it? Assuming you plan on using flavors on (which I think is a bad idea), do you
use exact or non-exact filters? How do you handle the difference between approaches?

Also, I would like to clarify the overlaps between Mogan and Nova, I bet there
must be some users who wants to use one API for the compute resources management
as they don't care about whether it's a virtual machine or a bare metal server.
Baremetal driver with Nova is still the right choice for such users to get raw
performance compute resources. On the contrary, Mogan is for real bare metal
users and cloud providers who wants to offer bare metals as a separated resources.

Thank you for your time!

[1] https://wiki.openstack.org/wiki/Mogan
[2]
https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html

--
Best Regards,
Zhenguo Niu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 25, 2017 by Dmitry_Tantsur (18,080 points)   2 3 5
0 votes

Thanks Dmitry for the feedback, please see my response inline.

On Mon, Sep 25, 2017 at 8:35 PM, Dmitry Tantsur dtantsur@redhat.com wrote:

Hi!

Thanks for raising this. I was interested in the project for some time,
but I never got a chance to wrap my head around. I also have a few concerns
- please see inline.

On 09/25/2017 01:27 PM, Zhenguo Niu wrote:

Hi folks,

First of all, thanks for the audiences for Mogan project update in the TC
room during Denver PTG. Here we would like to get more suggestions before
we apply for inclusion.

Speaking only for myself, I find the current direction of one
API+scheduler for vm/baremetal/container unfortunate. After containers
management moved out to be a separated project Zun, baremetal with Nova and
Ironic continues to be a pain point.

. API

Only part of the Nova APIs and parameters can apply to baremetal
instances, meanwhile for interoperable with other virtual drivers, bare
metal specific APIs such as deploy time RAID, advanced partitions can not
be included. It's true that we can support various compute drivers, but
the reality is that the support of each of hypervisor is not equal,
especially for bare metals in a virtualization world. But I understand the
problems with that as Nova was designed to provide compute
resources(virtual machines) instead of bare metals.

A correction: any compute resources.

Nova works okay with bare metals. It's never going to work perfectly
though, because we always have to find a common subset of features between
VM and BM. RAID is a good example indeed. We have a solution for the
future, but it's not going to satisfy everyone.

Now I have a question: to which extend do you plan to maintain the "cloud"
nature of the API? Let's take RAID as an example. Ironic can apply a very
generic or a very specific configuration. You can request "just RAID-5" or
you can ask for specific disks to be combined in a specific combination. I
believe the latter is not something we want to expose to cloud users, as
it's not going to be a cloud any more.

In fact, we don't have a clear spec for RAID support yet, but the team
tends to use a generic configuration just as the concerns you raised. But
if we can track disk information in Mogan or Placement(as a nested resource
provider with node), it's also possible for users to specify disks with
some hints like "SSD 500GB", then Mogan can match the disk and pass down a
specific configuration to Ironic. Anyhow, we should fully discuss this with
Ironic team after a spec proposed.

Besides RAID configuration, we already added partitions support when
claiming a server with parition images. But there is a limit to root,
ephemeral and swap as advanced partitions like LVM is not ready on Ironic
side. We are interested in working with Ironic team to make that done this
cycle.

. Scheduler

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource,
as nova-compute processes can't be run on the inventory nodes themselves.
That is to say host aggregates, availability zones and such things based on
compute service(host) can't be applied to bare metal resources. And for
grouping like anti-affinity, the granularity is also not same with virtual
machines, bare metal users may want their HA instances not on the same
failure domain instead of the node itself. Short saying, we can only get a
rigid resource class only scheduling for bare metals.

It's not rigid. Okay, it's rigid, but it's not as rigid as what we used to
have.

If you're going back to VCPUs-memory-disk triad, you're making it more
rigid. Of these three, only memory has ever made practical sense for
deployers. VCPUs is a bit subtle, as it depends on hyper-threading
enabled/disabled, and I've never seen people using it too often.

But our localgb thing is an outright lie. Of 20 disks a machine can
easily have, which one do you report for local
gb? Well, in the best case
people used ironic root device hints with ironic-inspector to figure out.
Which is great, but requires ironic-inspector. In the worst case people
just put random number there to make scheduling work. This is horrible,
please make sure to not get back to it.

I dont' mean to get back to the original VCPUs-memory-disk scheduling here.
Currently we just follow the "rigid" resource class scheduling as what nova
does but with node aggregates and affinity/anti-affinity grouping support.

What I would love to see of a bare metal scheduling project is a
scheduling based on inventory. I was thinking of being able to express
things like "give me a node with 2 GPU of at least 256 CUDA cores each". Do
you plan on this kind of things? This would truly mean flexible scheduling.

Which brings me to one of my biggest reservations about Mogan: I don't
think copying Nova's architecture is a good idea overall. Particularly, I
think you have flavors, which do not map at all into bare metal world IMO.

Yes, totally agree. Mogan is relatively new project, and we are open for
all suggestions from the community especially from Ironic team as you know
bare metal better. About what truly mean flexible scheduling and whether we
need a flavor to map into bare metal world, we can work out together,
that's why Mogan created and why we would like to apply for inclusion.

And most of the cloud providers in the market offering virtual machines
and bare metals as separated resources, but unfortunately, it's hard to
achieve this with one compute service.

Do you have proofs for the first statement? And do you imply public
clouds? Our customers deploy hybrid environments, to my best knowledge.
Nobody I know uses one compute service in the whole cloud anyway.

Yes, public clouds, please check the links below.

http://www.hwclouds.com/en-us/product/bms.html
https://www.ibm.com/cloud-computing/bluemix/bare-metal-servers
https://cloud.tencent.com/product/cpm

I heard people are deploying seperated Nova for virtual machines and bare

metals with many downstream hacks to the bare metal single-driver Nova but
as the changes to Nova would be massive and may invasive to virtual
machines, it seems not practical to be upstream.

I think you're overestimated the problem. In TripleO we deploy separate
virtual nova compute nodes. If ironic is enabled, its nova computes go to
controllers. Then you can use host aggregates to split flavors between VM
and BM. With resources classes it's even more trivial: you get this split
naturally.

I also mean the public cloud scenario, when we offer bare metal as first
class resources instead of generic compute resources. Yes it's true that
you can use host aggregates to flavors and resource classes to get VM and
BM split naturally. But it's impossible to manage quota separately and even
worse we don't have a filter to list BMs and VMs separately as they are
just same resources.

So we created Mogan [1] about one year ago, which aims to offer bare
metals as first class resources to users with a set of bare metal specific
API and a baremetal-centric scheduler(with Placement service). It was like
an experimental project at the beginning, but the outcome makes us believe
it's the right way. Mogan will fully embrace Ironic for bare metal
provisioning and with RSD server [2] introduced to OpenStack, it will be a
new world for bare metals, as with that we can compose hardware resources
on the fly.

Good that you touched this topic, because I have a question here :)

With ironic you request a node. With RSD and similar you create a
node, which is closer to VMs than to traditional BMs. This gives a similar
problem to what we have with nova now. Namely, exact vs non-exact filters.
How do you solve it? Assuming you plan on using flavors on (which I think
is a bad idea), do you use exact or non-exact filters? How do you handle
the difference between approaches?

Mogan will talk to RSD Pod Manager to compose hardware instead of
doing scheduling itself, then enroll the node/ports to Ironic and do
provisioning with the redfish driver. So there's no different filters
problems mentioned above as we don't do scheduling for such servers at all.
The detailed spec will be proposed soon.

https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/pod-manager-api-specification.pdf

Also, I would like to clarify the overlaps between Mogan and Nova, I bet
there must be some users who wants to use one API for the compute resources
management as they don't care about whether it's a virtual machine or a
bare metal server. Baremetal driver with Nova is still the right choice for
such users to get raw performance compute resources. On the contrary, Mogan
is for real bare metal users and cloud providers who wants to offer bare
metals as a separated resources.

Thank you for your time!

[1] https://wiki.openstack.org/wiki/Mogan
[2] https://www.intel.com/content/www/us/en/architecture-and-tec
hnology/rack-scale-design-overview.html

--
Best Regards,
Zhenguo Niu



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscrib
e
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best Regards,
Zhenguo Niu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 26, 2017 by Zhenguo_Niu (2,400 points)   1 4
0 votes

On 9/25/2017 6:27 AM, Zhenguo Niu wrote:
Hi folks,

First of all, thanks for the audiences for Mogan project update in the
TC room during Denver PTG. Here we would like to get more suggestions
before we apply for inclusion.

Speaking only for myself, I find the current direction of one
API+scheduler for vm/baremetal/container unfortunate. After containers
management moved out to be a separated project Zun, baremetal with Nova
and Ironic continues to be a pain point.

. API

Only part of the Nova APIs and parameters can apply to baremetal
instances, meanwhile for interoperable with other virtual drivers, bare
metal specific APIs such as deploy time RAID, advanced partitions can
not  be included. It's true that we can support various compute drivers,
but the reality is that the support of each of hypervisor is not equal,
especially for bare metals in a virtualization world. But I understand
the problems with that as Nova was designed to provide compute
resources(virtual machines) instead of bare metals.

. Scheduler

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource,
as nova-compute processes can't be run on the inventory nodes
themselves. That is to say host aggregates, availability zones and such
things based on compute service(host) can't be applied to bare metal
resources. And for grouping like anti-affinity, the granularity is also
not same with virtual machines, bare metal users may want their HA
instances not on the same failure domain instead of the node itself.
Short saying, we can only get a rigid resource class only scheduling for
bare metals.

And most of the cloud providers in the market offering virtual machines
and bare metals as separated resources, but unfortunately, it's hard to
achieve this with one compute service. I heard people are deploying
seperated Nova for virtual machines and bare metals with many downstream
hacks to the bare metal single-driver Nova but as the changes to Nova
would be massive and may invasive to virtual machines, it seems not
practical to be upstream.

So we created Mogan [1] about one year ago, which aims to offer bare
metals as first class resources to users with a set of bare metal
specific API and a baremetal-centric scheduler(with Placement service).
It was like an experimental project at the beginning, but the outcome
makes us believe it's the right way. Mogan will fully embrace Ironic for
bare metal provisioning and with RSD server [2] introduced to OpenStack,
it will be a new world for bare metals, as with that we can compose
hardware resources on the fly.

Also, I would like to clarify the overlaps between Mogan and Nova, I bet
there must be some users who wants to use one API for the compute
resources management as they don't care about whether it's a virtual
machine or a bare metal server. Baremetal driver with Nova is still the
right choice for such users to get raw performance compute resources. On
the contrary, Mogan is for real bare metal users and cloud providers who
wants to offer bare metals as a separated resources.

Thank you for your time!

[1] https://wiki.openstack.org/wiki/Mogan
[2]
https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html

--
Best Regards,
Zhenguo Niu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Cross-posting to the operators list since they are the community that
you'll likely need to convince the most about Mogan and whether or not
they want to start experimenting with it.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 5
0 votes

My main question here would be this: If you feel there are deficiencies in
Ironic, why not contribute to improving Ironic rather than spawning a whole
new project?

I am happy to take a look at it, and I'm by no means trying to contradict
your assumptions here. I just get concerned with the overhead and confusion
that comes with competing projects.

Also, if you'd like to discuss this in detail with a room full of bodies, I
suggest proposing a session for the Forum in Sydney. If some of the
contributors will be there, it would be a good opportunity for you to get
feedback.

Cheers,
Erik

On Sep 26, 2017 8:41 PM, "Matt Riedemann" mriedemos@gmail.com wrote:

On 9/25/2017 6:27 AM, Zhenguo Niu wrote:

Hi folks,

First of all, thanks for the audiences for Mogan project update in the TC
room during Denver PTG. Here we would like to get more suggestions before
we apply for inclusion.

Speaking only for myself, I find the current direction of one
API+scheduler for vm/baremetal/container unfortunate. After containers
management moved out to be a separated project Zun, baremetal with Nova and
Ironic continues to be a pain point.

. API

Only part of the Nova APIs and parameters can apply to baremetal
instances, meanwhile for interoperable with other virtual drivers, bare
metal specific APIs such as deploy time RAID, advanced partitions can not
be included. It's true that we can support various compute drivers, but
the reality is that the support of each of hypervisor is not equal,
especially for bare metals in a virtualization world. But I understand the
problems with that as Nova was designed to provide compute
resources(virtual machines) instead of bare metals.

. Scheduler

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource,
as nova-compute processes can't be run on the inventory nodes themselves.
That is to say host aggregates, availability zones and such things based on
compute service(host) can't be applied to bare metal resources. And for
grouping like anti-affinity, the granularity is also not same with virtual
machines, bare metal users may want their HA instances not on the same
failure domain instead of the node itself. Short saying, we can only get a
rigid resource class only scheduling for bare metals.

And most of the cloud providers in the market offering virtual machines
and bare metals as separated resources, but unfortunately, it's hard to
achieve this with one compute service. I heard people are deploying
seperated Nova for virtual machines and bare metals with many downstream
hacks to the bare metal single-driver Nova but as the changes to Nova would
be massive and may invasive to virtual machines, it seems not practical to
be upstream.

So we created Mogan [1] about one year ago, which aims to offer bare
metals as first class resources to users with a set of bare metal specific
API and a baremetal-centric scheduler(with Placement service). It was like
an experimental project at the beginning, but the outcome makes us believe
it's the right way. Mogan will fully embrace Ironic for bare metal
provisioning and with RSD server [2] introduced to OpenStack, it will be a
new world for bare metals, as with that we can compose hardware resources
on the fly.

Also, I would like to clarify the overlaps between Mogan and Nova, I bet
there must be some users who wants to use one API for the compute resources
management as they don't care about whether it's a virtual machine or a
bare metal server. Baremetal driver with Nova is still the right choice for
such users to get raw performance compute resources. On the contrary, Mogan
is for real bare metal users and cloud providers who wants to offer bare
metals as a separated resources.

Thank you for your time!

[1] https://wiki.openstack.org/wiki/Mogan
[2] https://www.intel.com/content/www/us/en/architecture-and-tec
hnology/rack-scale-design-overview.html

--
Best Regards,
Zhenguo Niu



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscrib
e
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Cross-posting to the operators list since they are the community that
you'll likely need to convince the most about Mogan and whether or not they
want to start experimenting with it.

--

Thanks,

Matt


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by Erik_McCormick (3,880 points)   2 3
0 votes

Thanks Erik for the response!

I don't mean there are deficiencies in Ironic. Ironic itself is cool, it
works well with TripleO, Nova, Kolla, etc. Mogan just want to be another
client to schedule workloards on Ironic and provide bare metal specific
APIs for users who seeks a way to provider virtual machines and bare metals
separately, or just bare metal cloud without interoperble with other
compute resources under Nova.

On Wed, Sep 27, 2017 at 8:53 AM, Erik McCormick emccormick@cirrusseven.com
wrote:

My main question here would be this: If you feel there are deficiencies in
Ironic, why not contribute to improving Ironic rather than spawning a whole
new project?

I am happy to take a look at it, and I'm by no means trying to contradict
your assumptions here. I just get concerned with the overhead and confusion
that comes with competing projects.

Also, if you'd like to discuss this in detail with a room full of bodies,
I suggest proposing a session for the Forum in Sydney. If some of the
contributors will be there, it would be a good opportunity for you to get
feedback.

Cheers,
Erik

On Sep 26, 2017 8:41 PM, "Matt Riedemann" mriedemos@gmail.com wrote:

On 9/25/2017 6:27 AM, Zhenguo Niu wrote:

Hi folks,

First of all, thanks for the audiences for Mogan project update in the
TC room during Denver PTG. Here we would like to get more suggestions
before we apply for inclusion.

Speaking only for myself, I find the current direction of one
API+scheduler for vm/baremetal/container unfortunate. After containers
management moved out to be a separated project Zun, baremetal with Nova and
Ironic continues to be a pain point.

. API

Only part of the Nova APIs and parameters can apply to baremetal
instances, meanwhile for interoperable with other virtual drivers, bare
metal specific APIs such as deploy time RAID, advanced partitions can not
be included. It's true that we can support various compute drivers, but
the reality is that the support of each of hypervisor is not equal,
especially for bare metals in a virtualization world. But I understand the
problems with that as Nova was designed to provide compute
resources(virtual machines) instead of bare metals.

. Scheduler

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource,
as nova-compute processes can't be run on the inventory nodes themselves.
That is to say host aggregates, availability zones and such things based on
compute service(host) can't be applied to bare metal resources. And for
grouping like anti-affinity, the granularity is also not same with virtual
machines, bare metal users may want their HA instances not on the same
failure domain instead of the node itself. Short saying, we can only get a
rigid resource class only scheduling for bare metals.

And most of the cloud providers in the market offering virtual machines
and bare metals as separated resources, but unfortunately, it's hard to
achieve this with one compute service. I heard people are deploying
seperated Nova for virtual machines and bare metals with many downstream
hacks to the bare metal single-driver Nova but as the changes to Nova would
be massive and may invasive to virtual machines, it seems not practical to
be upstream.

So we created Mogan [1] about one year ago, which aims to offer bare
metals as first class resources to users with a set of bare metal specific
API and a baremetal-centric scheduler(with Placement service). It was like
an experimental project at the beginning, but the outcome makes us believe
it's the right way. Mogan will fully embrace Ironic for bare metal
provisioning and with RSD server [2] introduced to OpenStack, it will be a
new world for bare metals, as with that we can compose hardware resources
on the fly.

Also, I would like to clarify the overlaps between Mogan and Nova, I bet
there must be some users who wants to use one API for the compute resources
management as they don't care about whether it's a virtual machine or a
bare metal server. Baremetal driver with Nova is still the right choice for
such users to get raw performance compute resources. On the contrary, Mogan
is for real bare metal users and cloud providers who wants to offer bare
metals as a separated resources.

Thank you for your time!

[1] https://wiki.openstack.org/wiki/Mogan
[2] https://www.intel.com/content/www/us/en/architecture-and-tec
hnology/rack-scale-design-overview.html

--
Best Regards,
Zhenguo Niu



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.op
enstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Cross-posting to the operators list since they are the community that
you'll likely need to convince the most about Mogan and whether or not they
want to start experimenting with it.

--

Thanks,

Matt


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best Regards,
Zhenguo Niu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by Zhenguo_Niu (2,400 points)   1 4
0 votes

On 2017-09-27 09:15:21 +0800 (+0800), Zhenguo Niu wrote:
[...]
I don't mean there are deficiencies in Ironic. Ironic itself is cool, it
works well with TripleO, Nova, Kolla, etc. Mogan just want to be another
client to schedule workloards on Ironic and provide bare metal specific
APIs for users who seeks a way to provider virtual machines and bare metals
separately, or just bare metal cloud without interoperble with other
compute resources under Nova.
[...]

The short explanation which clicked for me (granted it's probably an
oversimplification, but still) was this: Ironic provides an admin
API for managing bare metal resources, while Mogan gives you a user
API (suitable for public cloud use cases) to your Ironic backend. I
suppose it could have been implemented in Ironic, but implementing
it separately allows Ironic to be agnostic to multiple user
frontends and also frees the Ironic team up from having to take on
yet more work directly.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Sep 27, 2017 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On 27/09/17 01:59 +0000, Jeremy Stanley wrote:
On 2017-09-27 09:15:21 +0800 (+0800), Zhenguo Niu wrote:
[...]

I don't mean there are deficiencies in Ironic. Ironic itself is cool, it
works well with TripleO, Nova, Kolla, etc. Mogan just want to be another
client to schedule workloards on Ironic and provide bare metal specific
APIs for users who seeks a way to provider virtual machines and bare metals
separately, or just bare metal cloud without interoperble with other
compute resources under Nova.
[...]

The short explanation which clicked for me (granted it's probably an
oversimplification, but still) was this: Ironic provides an admin
API for managing bare metal resources, while Mogan gives you a user
API (suitable for public cloud use cases) to your Ironic backend. I
suppose it could have been implemented in Ironic, but implementing
it separately allows Ironic to be agnostic to multiple user
frontends and also frees the Ironic team up from having to take on
yet more work directly.

ditto!

I had a similar question at the PTG and this was the answer that convinced be
may be worth the effort.

Flavio

--
@flaper87
Flavio Percoco


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Sep 27, 2017 by Flavio_Percoco (36,960 points)   3 7 10
0 votes

[...]

The short explanation which clicked for me (granted it's probably an
oversimplification, but still) was this: Ironic provides an admin
API for managing bare metal resources, while Mogan gives you a user
API (suitable for public cloud use cases) to your Ironic backend. I
suppose it could have been implemented in Ironic, but implementing
it separately allows Ironic to be agnostic to multiple user
frontends and also frees the Ironic team up from having to take on
yet more work directly.

ditto!

I had a similar question at the PTG and this was the answer that convinced
be
may be worth the effort.

Flavio

For Ironic, the question did come at the PTG up of tenant aware
scheduling of owned hardware, as in Customer A and B are managed by
the same ironic, only customer A's users should be able to schedule on
to Customer A's hardware, with API access control restrictions such
that specific customer can take action on their own hardware.

If we go down the path of supporting such views/logic, it could become
a massive undertaking for Ironic, so there is absolutely a plus to
something doing much of that for Ironic. Personally, I think Mogan is
a good direction to continue to explore. That being said, we should
improve our communication of plans/directions/perceptions between the
teams so we don't adversely impact each other and see where we can
help each other moving forward.

-Julia


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by Julia_Kreger (1,460 points)   3
0 votes

On 09/27/2017 09:31 AM, Julia Kreger wrote:
[...]

The short explanation which clicked for me (granted it's probably an
oversimplification, but still) was this: Ironic provides an admin
API for managing bare metal resources, while Mogan gives you a user
API (suitable for public cloud use cases) to your Ironic backend. I
suppose it could have been implemented in Ironic, but implementing
it separately allows Ironic to be agnostic to multiple user
frontends and also frees the Ironic team up from having to take on
yet more work directly.

ditto!

I had a similar question at the PTG and this was the answer that convinced
be
may be worth the effort.

Flavio

For Ironic, the question did come at the PTG up of tenant aware
scheduling of owned hardware, as in Customer A and B are managed by
the same ironic, only customer A's users should be able to schedule on
to Customer A's hardware, with API access control restrictions such
that specific customer can take action on their own hardware.

If we go down the path of supporting such views/logic, it could become
a massive undertaking for Ironic, so there is absolutely a plus to
something doing much of that for Ironic. Personally, I think Mogan is
a good direction to continue to explore. That being said, we should
improve our communication of plans/directions/perceptions between the
teams so we don't adversely impact each other and see where we can
help each other moving forward.

My biggest concern with Mogan is that it forks Nova, then starts
changing interfaces. Nova's got 2 really big API surfaces.

1) The user facing API, which is reasonably well documented, and under
tight control. Mogan has taken key things at 95% similarity and changed
bits. So servers includes things like a partitions parameter.
https://github.com/openstack/mogan/blob/master/api-ref/source/v1/servers.inc#request-4

This being nearly the same but slightly different ends up being really
weird. Especially as Nova evolves it's code with microversions for
things like embedded flavor info.

2) The guest facing API of metadata/config drive. This is far less
documented or tested, and while we try to be strict about adding in
information here in a versioned way, it's never seen the same attention
as the user API on either documentation or version rigor.

That's presumably getting changed, going to drift as well, which means
discovering multiple implementations that are nearly, but not exactly
the same that drift.

The point of licensing things under and Apache 2 license was to enable
folks to do all kind of experiments like this. And experiments are good.
But part of the point of experiments is to learn lessons to bring back
into the fold. Digging out of the multi year hole of "close but not
exactly the same" API differences between nova-net and neutron really
makes me want to make sure we never intentionally inflict that confusion
on folks again.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

Thanks Sean for raising the concerns. We don't really fork nova but some
parts of the "ABI" of it. For the 2 API surfaces, we have different
strategies, please see explanations below:

On Wed, Sep 27, 2017 at 10:34 PM, Sean Dague sean@dague.net wrote:

On 09/27/2017 09:31 AM, Julia Kreger wrote:

[...]

The short explanation which clicked for me (granted it's probably an
oversimplification, but still) was this: Ironic provides an admin
API for managing bare metal resources, while Mogan gives you a user
API (suitable for public cloud use cases) to your Ironic backend. I
suppose it could have been implemented in Ironic, but implementing
it separately allows Ironic to be agnostic to multiple user
frontends and also frees the Ironic team up from having to take on
yet more work directly.

ditto!

I had a similar question at the PTG and this was the answer that
convinced
be
may be worth the effort.

Flavio

For Ironic, the question did come at the PTG up of tenant aware
scheduling of owned hardware, as in Customer A and B are managed by
the same ironic, only customer A's users should be able to schedule on
to Customer A's hardware, with API access control restrictions such
that specific customer can take action on their own hardware.

If we go down the path of supporting such views/logic, it could become
a massive undertaking for Ironic, so there is absolutely a plus to
something doing much of that for Ironic. Personally, I think Mogan is
a good direction to continue to explore. That being said, we should
improve our communication of plans/directions/perceptions between the
teams so we don't adversely impact each other and see where we can
help each other moving forward.

My biggest concern with Mogan is that it forks Nova, then starts
changing interfaces. Nova's got 2 really big API surfaces.

1) The user facing API, which is reasonably well documented, and under
tight control. Mogan has taken key things at 95% similarity and changed
bits. So servers includes things like a partitions parameter.
https://github.com/openstack/mogan/blob/master/api-ref/
source/v1/servers.inc#request-4

This being nearly the same but slightly different ends up being really
weird. Especially as Nova evolves it's code with microversions for
things like embedded flavor info.

For user facing API, We defined a new set of API instead of following Nova,
which is more specific for bare metals. The similarity of key things is
because virtual machines and bare metals key attributes are similar
naturally. Mogan is relatively new project, with more features introduced,
things will become different in future.

2) The guest facing API of metadata/config drive. This is far less
documented or tested, and while we try to be strict about adding in
information here in a versioned way, it's never seen the same attention
as the user API on either documentation or version rigor.

That's presumably getting changed, going to drift as well, which means
discovering multiple implementations that are nearly, but not exactly
the same that drift.

About guest facing API, we only support config drive now, which is copied
from Nova but we don't want to diverge from it. Regarding this part, we
will try to sync with nova periodically or maybe refactoring these files to
be a shared library is the best way, we will try to figure out.

The point of licensing things under and Apache 2 license was to enable
folks to do all kind of experiments like this. And experiments are good.
But part of the point of experiments is to learn lessons to bring back
into the fold. Digging out of the multi year hole of "close but not
exactly the same" API differences between nova-net and neutron really
makes me want to make sure we never intentionally inflict that confusion
on folks again.

    -Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best Regards,
Zhenguo Niu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 28, 2017 by Zhenguo_Niu (2,400 points)   1 4
...