Thanks for raising this. I was interested in the project for some time, but I
never got a chance to wrap my head around. I also have a few concerns - please
On 09/25/2017 01:27 PM, Zhenguo Niu wrote:
First of all, thanks for the audiences for Mogan project update in the TC room
during Denver PTG. Here we would like to get more suggestions before we apply
Speaking only for myself, I find the current direction of one API+scheduler for
vm/baremetal/container unfortunate. After containers management moved out to be
a separated project Zun, baremetal with Nova and Ironic continues to be a pain
Only part of the Nova APIs and parameters can apply to baremetal instances,
meanwhile for interoperable with other virtual drivers, bare metal specific APIs
such as deploy time RAID, advanced partitions can not be included. It's true
that we can support various compute drivers, but the reality is that the support
of each of hypervisor is not equal, especially for bare metals in a
virtualization world. But I understand the problems with that as Nova was
designed to provide compute resources(virtual machines) instead of bare metals.
A correction: any compute resources.
Nova works okay with bare metals. It's never going to work perfectly though,
because we always have to find a common subset of features between VM and BM.
RAID is a good example indeed. We have a solution for the future, but it's not
going to satisfy everyone.
Now I have a question: to which extend do you plan to maintain the "cloud"
nature of the API? Let's take RAID as an example. Ironic can apply a very
generic or a very specific configuration. You can request "just RAID-5" or you
can ask for specific disks to be combined in a specific combination. I believe
the latter is not something we want to expose to cloud users, as it's not going
to be a cloud any more.
Bare metal doesn't fit in to the model of 1:1 nova-compute to resource, as
nova-compute processes can't be run on the inventory nodes themselves. That is
to say host aggregates, availability zones and such things based on compute
service(host) can't be applied to bare metal resources. And for grouping like
anti-affinity, the granularity is also not same with virtual machines, bare
metal users may want their HA instances not on the same failure domain instead
of the node itself. Short saying, we can only get a rigid resource class only
scheduling for bare metals.
It's not rigid. Okay, it's rigid, but it's not as rigid as what we used to have.
If you're going back to VCPUs-memory-disk triad, you're making it more rigid. Of
these three, only memory has ever made practical sense for deployers. VCPUs is a
bit subtle, as it depends on hyper-threading enabled/disabled, and I've never
seen people using it too often.
But our localgb thing is an outright lie. Of 20 disks a machine can easily
have, which one do you report for localgb? Well, in the best case people used
ironic root device hints with ironic-inspector to figure out. Which is great,
but requires ironic-inspector. In the worst case people just put random number
there to make scheduling work. This is horrible, please make sure to not get
back to it.
What I would love to see of a bare metal scheduling project is a scheduling
based on inventory. I was thinking of being able to express things like "give me
a node with 2 GPU of at least 256 CUDA cores each". Do you plan on this kind of
things? This would truly mean flexible scheduling.
Which brings me to one of my biggest reservations about Mogan: I don't think
copying Nova's architecture is a good idea overall. Particularly, I think you
have flavors, which do not map at all into bare metal world IMO.
And most of the cloud providers in the market offering virtual machines and bare
metals as separated resources, but unfortunately, it's hard to achieve this with
one compute service.
Do you have proofs for the first statement? And do you imply public clouds? Our
customers deploy hybrid environments, to my best knowledge. Nobody I know uses
one compute service in the whole cloud anyway.
I heard people are deploying seperated Nova for virtual
machines and bare metals with many downstream hacks to the bare metal
single-driver Nova but as the changes to Nova would be massive and may invasive
to virtual machines, it seems not practical to be upstream.
I think you're overestimated the problem. In TripleO we deploy separate virtual
nova compute nodes. If ironic is enabled, its nova computes go to controllers.
Then you can use host aggregates to split flavors between VM and BM. With
resources classes it's even more trivial: you get this split naturally.
So we created Mogan  about one year ago, which aims to offer bare metals as
first class resources to users with a set of bare metal specific API and a
baremetal-centric scheduler(with Placement service). It was like an experimental
project at the beginning, but the outcome makes us believe it's the right way.
Mogan will fully embrace Ironic for bare metal provisioning and with RSD server
 introduced to OpenStack, it will be a new world for bare metals, as with
that we can compose hardware resources on the fly.
Good that you touched this topic, because I have a question here :)
With ironic you request a node. With RSD and similar you create a node,
which is closer to VMs than to traditional BMs. This gives a similar problem to
what we have with nova now. Namely, exact vs non-exact filters. How do you solve
it? Assuming you plan on using flavors on (which I think is a bad idea), do you
use exact or non-exact filters? How do you handle the difference between approaches?