settingsLogin | Registersettings

[openstack-dev] [nova] Looking for feedback on a spec to limit max_count in multi-create requests

0 votes

I've been chasing something weird I was seeing in devstack when creating
hundreds of instances in a single request where at some limit, things
blow up in an unexpected way during scheduling and all instances were
put into ERROR state. Given the environment I was running in, this
shouldn't have been happening, and today we figured out what was
actually happening. To summarize, we retry scheduling requests on RPC
timeout so you can have schedulermaxattempts greenthreads running
concurrently trying to schedule 1000 instances and melt your scheduler.

I've started a spec which goes into the details of the actual issue:

https://review.openstack.org/#/c/510235/

It also proposes a solution, but I don't feel it's the greatest
solution, so there are also some alternatives in there.

I'm really interested in operator feedback on this because I assume that
people are dealing with stuff like this in production already, and have
had to come up with ways to solve it.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Oct 12, 2017 in openstack-dev by mriedemos_at_gmail.c (15,720 points)   2 4 5

1 Response

0 votes

On 10/12/2017 4:09 AM, Saverio Proto wrote:
Hello Matt,

starting 1000 instances in production works for me already. We are on
Openstack Newton.
I described my configuration here:
https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/

If things blow up for you with hundreds, probably there is a
regression somewhere.

Thanks

Saverio

Thanks for posting that article, I've left some comments and questions
in there for you. :)

Once I realized that conductor was timing out waiting for the response
from the scheduler selectdestinations() call, bumping the
rpc
response_timeout config made things work.

Note that in my single-node fake driver devstack setup here I'm not
creating real guests nor am I setting up networking. For the purpose of
my test I was just interested in the conductor/scheduler/placement
service interaction, not so much with the actual guest creation that
happens in the compute service.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 12, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 5
...