settingsLogin | Registersettings

[Openstack-operators] Request for feedback on DHCP IP usage

0 votes

Hi operators,

I wanted to ask for feedback on a design issue regarding DHCP agent and IP per
agent.

So a short introduction first - I want to propose a spec to have a distributed
DHCP agent that can run directly on the compute node, and service only the VMs
running locally on it.
This will help balance out the DHCP agent accross the cloud and each node will
only get the information it requires (no more MB size messages which get the
queue stuck).
It will also limit the scope of failure of the DHCP agent and/or service to
that compute node alone.

Now, regarding the IP consumption there are two possible alternatives:
1. Use single IP per serviced subnet for all the servers. (similar to DVR)
2. Use IP per server per subnet per host where VMs are serviced.

So in a theoretical cloud with 100 running VMs for 10 subnets and 10 compute
nodes, per subnet the 1st approach will take only 1 IP while the second will
take a minimum of 1 IP and a maximum of 10 (limited by amount of compute nodes).

Now, I know the 1st solution seems very appealing but thinking of it further
reveals very serious limitations:
* No HA for DHCP agents is possible (more prone to certain race conditions).
* DHCP IP can't be reached from outside the cloud.
* You will just see a single port per subnet in Neutron, without granularity of
the host binding (but perhaps it's not that bad).
* This solution will be tied initially only to OVS mechanism driver, each other
driver or 3rd party plugin will have to support it individually in some way.

So basically my question is - which solution would you prefer as a cloud op?

Is it that bad to consume more than 1 IP, given that we're talking about private
isolated networks?

Regards,
Mike

asked Oct 6, 2014 in openstack-operators by Mike_Kolesnik (620 points)   1

5 Responses

0 votes

On 10/06/2014 04:09 AM, Mike Kolesnik wrote:

Now, I know the 1st solution seems very appealing but thinking of it further
reveals very serious limitations:
* No HA for DHCP agents is possible (more prone to certain race conditions).
eventually they will be just bugs, bugs can be fixed

  • DHCP IP can't be reached from outside the cloud.
    that's a feature :)

  • You will just see a single port per subnet in Neutron, without granularity of
    the host binding (but perhaps it's not that bad).

may be an issue for monitoring, i will have more ports deployed that
registered in my db.
i don't know if is really an issue, still does not sounds good

  • This solution will be tied initially only to OVS mechanism driver, each other
    driver or 3rd party plugin will have to support it individually in some way.

So basically my question is - which solution would you prefer as a cloud op?
option 2 is a no go for me, i can't waste that many ip

Is it that bad to consume more than 1 IP, given that we're talking about private
isolated networks?

not always, all the vm we deploy in the prod environment have public ip,
they speak freely to the internet. no nat, no lbaas.

none of the limitations you mention for the first solution sounds
problematic to me (if individual dhcp servers can be managed individually)

--
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333

responded Oct 6, 2014 by gustavo_panizzo_(gfa (3,080 points)   2 2
0 votes

Hi Gustavo,

Thanks for the prompt reply, comments inline.

Regards,
Mike

----- Original Message -----

On 10/06/2014 04:09 AM, Mike Kolesnik wrote:

Now, I know the 1st solution seems very appealing but thinking of it
further
reveals very serious limitations:
* No HA for DHCP agents is possible (more prone to certain race
conditions).
eventually they will be just bugs, bugs can be fixed

  • DHCP IP can't be reached from outside the cloud.
    that's a feature :)

  • You will just see a single port per subnet in Neutron, without
    granularity of
    the host binding (but perhaps it's not that bad).

may be an issue for monitoring, i will have more ports deployed that
registered in my db.
i don't know if is really an issue, still does not sounds good

  • This solution will be tied initially only to OVS mechanism driver, each
    other
    driver or 3rd party plugin will have to support it individually in some
    way.

So basically my question is - which solution would you prefer as a cloud
op?
option 2 is a no go for me, i can't waste that many ip

Is it that bad to consume more than 1 IP, given that we're talking about
private
isolated networks?

not always, all the vm we deploy in the prod environment have public ip,
they speak freely to the internet. no nat, no lbaas.

So basically the DHCP server is also consuming a public IP?

Also since you're always using the public network, does distributing the
DHCP agents/servers sound interesting?

none of the limitations you mention for the first solution sounds
problematic to me (if individual dhcp servers can be managed individually)

Well the idea is that the servers be managed automatically by Neutron i.e.
it will decided which DHCP serves which IP.

What plugin are you deploying, ML2+OVS?

--
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333

responded Oct 6, 2014 by Mike_Kolesnik (620 points)   1
0 votes

On 10/06/2014 06:11 AM, Mike Kolesnik wrote:

On 10/06/2014 04:09 AM, Mike Kolesnik wrote:

Now, I know the 1st solution seems very appealing but thinking of it
further
reveals very serious limitations:
* No HA for DHCP agents is possible (more prone to certain race
conditions).
eventually they will be just bugs, bugs can be fixed

  • DHCP IP can't be reached from outside the cloud.
    that's a feature :)

  • You will just see a single port per subnet in Neutron, without
    granularity of
    the host binding (but perhaps it's not that bad).

may be an issue for monitoring, i will have more ports deployed that
registered in my db.
i don't know if is really an issue, still does not sounds good

  • This solution will be tied initially only to OVS mechanism driver, each
    other
    driver or 3rd party plugin will have to support it individually in some
    way.

So basically my question is - which solution would you prefer as a cloud
op?
option 2 is a no go for me, i can't waste that many ip

Is it that bad to consume more than 1 IP, given that we're talking about
private
isolated networks?

not always, all the vm we deploy in the prod environment have public ip,
they speak freely to the internet. no nat, no lbaas.

So basically the DHCP server is also consuming a public IP?

Yes. And, for the record, this is how nova-network in multi-host mode works.

Also since you're always using the public network, does distributing the
DHCP agents/servers sound interesting?

Yes, due to the spread out of the failure domain. As you point out, with
multi-host nova-network mode, each compute node has a DHCP server that
services the VMs on that particular compute node only. This means that a
centralized DHCP agent doesn't bring down IP assignment services across
a large swath of the deployment, which is a huge plus (and what DVR is
aiming for, IIRC)

Best,
-jay

responded Oct 6, 2014 by Jay_Pipes (59,760 points)   3 10 14
0 votes

Hi operators,

I wanted to ask for feedback on a design issue regarding DHCP agent and IP per
agent.

Very happy about dev's coming here for input :)

Now, regarding the IP consumption there are two possible alternatives:
1. Use single IP per serviced subnet for all the servers. (similar to DVR)
2. Use IP per server per subnet per host where VMs are serviced.

So in a theoretical cloud with 100 running VMs for 10 subnets and 10 compute
nodes, per subnet the 1st approach will take only 1 IP while the second will
take a minimum of 1 IP and a maximum of 10 (limited by amount of compute nodes).

If I understand correctly taking an IP (potentially) by the number of hypervisors can quickly go to insane proportions.
A one on one ratio would not be so far fetched for a cloud with a significant amount of hypervisors.
For us the current "standard" /24 would become smallish...
Also when live-migrating machines to a different hypervisor you could run out if IP's for the DHCP servers...

Now, I know the 1st solution seems very appealing but thinking of it further
reveals very serious limitations:
* No HA for DHCP agents is possible (more prone to certain race conditions).
* DHCP IP can't be reached from outside the cloud.
* You will just see a single port per subnet in Neutron, without granularity of
the host binding (but perhaps it's not that bad).
* This solution will be tied initially only to OVS mechanism driver, each other
driver or 3rd party plugin will have to support it individually in some way.

The thing that worries me the most is the implementation in the OVS mechanism driver.
I'd needs to be very well documented that you might not get feature parity with different drivers.

So basically my question is - which solution would you prefer as a cloud op?
As others before me I also lean toward option one.

Cheers,
Robert van leeuwen

responded Oct 7, 2014 by Robert_van_Leeuwen (2,000 points)   1 4
0 votes

Single IP per DHCP is nice.

And move dhcp agent away from network node gives important thing: you
can create isolated tenant networks without headache with dhcp-agents
scheduling. Neutron does not supports now AZ from nova, and DHCP is just
yet another thing to mess up with isolation.

Some ideas:

  1. You should create separate table(s) for DHCP with persistent rules
    (not 'disk-persistent', but 'not expiring') to helps debug problems.
    Rules in that table will have counters (bytes, packets and inactive time
    counter) - this really helps to see what happens.
  2. Some kind of watching (tail -f style) utility to watch for instances
    requests.
  3. Rejecting to start/stop instance (configure port) if dhcp agent on
    the host is not available. Get "ERROR" with meaningful trace is much
    better than 'oh instance does not reply to pings after reboot and we
    don't know why'.

On 10/06/2014 10:09 AM, Mike Kolesnik wrote:
Hi operators,

I wanted to ask for feedback on a design issue regarding DHCP agent and IP per
agent.

So a short introduction first - I want to propose a spec to have a distributed
DHCP agent that can run directly on the compute node, and service only the VMs
running locally on it.
This will help balance out the DHCP agent accross the cloud and each node will
only get the information it requires (no more MB size messages which get the
queue stuck).
It will also limit the scope of failure of the DHCP agent and/or service to
that compute node alone.

Now, regarding the IP consumption there are two possible alternatives:
1. Use single IP per serviced subnet for all the servers. (similar to DVR)
2. Use IP per server per subnet per host where VMs are serviced.

So in a theoretical cloud with 100 running VMs for 10 subnets and 10 compute
nodes, per subnet the 1st approach will take only 1 IP while the second will
take a minimum of 1 IP and a maximum of 10 (limited by amount of compute nodes).

Now, I know the 1st solution seems very appealing but thinking of it further
reveals very serious limitations:
* No HA for DHCP agents is possible (more prone to certain race conditions).
* DHCP IP can't be reached from outside the cloud.
* You will just see a single port per subnet in Neutron, without granularity of
the host binding (but perhaps it's not that bad).
* This solution will be tied initially only to OVS mechanism driver, each other
driver or 3rd party plugin will have to support it individually in some way.

So basically my question is - which solution would you prefer as a cloud op?

Is it that bad to consume more than 1 IP, given that we're talking about private
isolated networks?

Regards,
Mike


OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 7, 2014 by George_Shuklin (4,720 points)   1 7 12
...