Le 25/01/2017 05:10, Matt Riedemann a écrit :
On 1/24/2017 2:57 PM, Matt Riedemann wrote:
On 1/24/2017 2:38 PM, Sylvain Bauza wrote:
It's litterally 2 days before FeatureFreeze and we ask operators to
change their cloud right now ? Looks difficult to me and like I said in
multiple places by email, we have a ton of assertions saying it's
acceptable to have not all the filters.
I'm not sure why feature freeze in two days is going to make a huge
amount of difference here. Most large production clouds are probably
nowhere near trunk (I'm assuming most are on Mitaka or older at this
point just because of how deployments seem to tail the oldest supported
stable branch). Or are you mainly worried about deployment tooling
projects, like TripleO, needing to deal with this now?
Anyone upgrading to Ocata is going to have to read the release notes and
assess the upgrade impacts regardless of when we make this change, be
that Ocata or Pike.
Sylvain, are you suggesting that for Ocata if, for example, the
CoreFilter isn't in the list of enabled scheduler filters, we don't make
the request for VCPU when filtering resource providers, but we also log
a big fat warning in the n-sch logs saying we're going to switch over in
Pike and that cpuallocationratio needs to be configured because the
CoreFilter is going to be deprecated in Ocata and removed in Pike?
To recap the discussion we had in IRC today, we're moving forward with
the original plan of the filter scheduler always requesting VCPU,
MEMORYMB and DISKGB* regardless of the enabled filters. The main
reason being there isn't a clear path forward on straddling releases to
deprecate or make decisions based on the enabled filters and provide a
warning that makes sense.
For example, we can't deprecate the filters (at least yet) because the
caching scheduler is still using them (it's not using placement yet).
And if we logged a warning if you don't have the CoreFilter in
CONF.filterscheduler.enabledfilters, for example, but we don't want
you to have it in that list, then what are you supposed to do? i.e. the
goal is to not have the legacy primitive resource filters enabled for
the filter scheduler in Pike, so you get into this weird situation of
whether or not you have them enabled or not before Pike, and in what
cases do you log a warning that makes sense. So we agreed at this point
it's just simpler to say that if you don't enable these filters today,
you're going to need to configure the appropriate allocation ratio
configuration option prior to upgrading to Ocata. That will be in the
upgrade section of the release notes and we can probably also work it
into the placement devref as a deployment note. We can also work this
into the nova-status upgrade check CLI.
*DISK_GB is special since we might have a flavor that's not specifying
any disk or a resource provider with no DISK_GB allocations if the
instances are all booted from volumes.
Update on that agreement : I made the necessary modification in the
proposal  for not verifying the filters. We now send a request to the
Placement API by introspecting the flavor and we get a list of potential
When I began doing that modification, I know there was a functional test
about server groups that needed modifications to match our agreement. I
consequently made that change located in a separate patch  as a
prerequisite for .
I then spotted a problem that we didn't identified when discussing :
when checking a destination, the legacy filters for CPU, RAM and disk
don't verify the maximum capacity of the host, they only multiple the
total size by the allocation ratio, so our proposal works for them.
Now, when using the placement service, it fails because somewhere in the
DB call needed for returning the destinations, we also verify a specific
field named max_unit .
Consequently, the proposal we agreed is not feature-parity between
Newton and Ocata. If you follow our instructions, you will still get
different result from a placement perspective between what was in Newton
and what will be Ocata.
Technically speaking, the functional test is a canary bird, telling you
that you get NoValidHosts while it was working previously.
After that I'm stuck. We can be discussing for a while about whether all
of that is sane or not, but the fact is, there is a discrepancy.
Honestly, I don't know what to do unless considering that we're now so
close to the Feature Freeze that it's becoming an all-or-none situation.
My only silver bullet I still have could be considering a placement
failure as non blocker and fallbacking to calling the full list of nodes
for Ocata. I know that sucks, but I don't see how to unblock us in time
for getting  landed before tomorrow.
-Sylvain (exhausted, tired and nervous).