settingsLogin | Registersettings

[openstack-dev] [nova][neutron] How do you use the instance IP filter?

0 votes

Nova has had this long-standing known performance issue if you're
filtering a large number of instances by IP. The instance IPs are stored
in a JSON blob in the database so we don't do filtering in SQL. We pull
the instances out of the database, deserialize the JSON and then apply a
regex filter match in the nova-api python code.

At the Queens PTG we talked about possible ways to fix this and came up
with this nova spec:

https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html

The idea is to have nova get ports from neutron and apply the IP filter
in neutron to whittle down the ports, then from that list of ports get
the instances to pull out of the nova database.

One issue that has come up with this is neutron does not currently
support regex filters when listing ports. There is an RFE for adding that:

https://bugs.launchpad.net/neutron/+bug/1718605

The proposed neutron implementation is to just do SQL LIKE substring
matching in the database.

However, one issue that has come up is that the compute API accepts a
python regex filter and uses re.match():

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469

At least one good thing about that is match() only matches from the
beginning of the string unlike search().

So for example I can filter on "192.16.*[1-5]$" if I wanted to, but
that's not going to work with just a LIKE substring filter in SQL.

The question is, does anyone actually do more than basic substring
matching with the IP filter today? Because if we started using neutron,
that behavior would be broken. We've never actually documented the match
restrictions on the IP filter, but that's not a good reason to break it.

One option is to make this configurable such that deployments which rely
on the complicated pattern matching can just use the existing nova code
despite performance issues. However, that's not interoperable, I hate
config-driven API behavior, and it would mean maintaining two code paths
in nova, which is also terrible.

I was trying to think of a way to determine if the IP filter passed to
nova is basic or a complicated pattern match and let us decide that way,
but I'm not sure if there are good ways to detect that - maybe by simply
looking for special characters like (, ), - and $? But then there is []
and we have an IPv6 filter, so that gets messy too...

For now I'd just like to know if people rely on the regex match or not.
Other ideas on how to handle this are appreciated.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Nov 8, 2017 in openstack-operators by mriedemos_at_gmail.c (15,720 points)   2 4 5

6 Responses

0 votes

On Thu, Oct 26, 2017 at 10:23 PM, Matt Riedemann mriedemos@gmail.com
wrote:

Nova has had this long-standing known performance issue if you're
filtering a large number of instances by IP. The instance IPs are stored in
a JSON blob in the database so we don't do filtering in SQL. We pull the
instances out of the database, deserialize the JSON and then apply a regex
filter match in the nova-api python code.

At the Queens PTG we talked about possible ways to fix this and came up
with this nova spec:

https://specs.openstack.org/openstack/nova-specs/specs/queen
s/approved/improve-filter-instances-by-ip-performance.html

The idea is to have nova get ports from neutron and apply the IP filter in
neutron to whittle down the ports, then from that list of ports get the
instances to pull out of the nova database.

One issue that has come up with this is neutron does not currently support
regex filters when listing ports. There is an RFE for adding that:

https://bugs.launchpad.net/neutron/+bug/1718605

The proposed neutron implementation is to just do SQL LIKE substring
matching in the database.

However, one issue that has come up is that the compute API accepts a
python regex filter and uses re.match():

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469

At least one good thing about that is match() only matches from the
beginning of the string unlike search().

So for example I can filter on "192.16.*[1-5]$" if I wanted to, but that's
not going to work with just a LIKE substring filter in SQL.

The question is, does anyone actually do more than basic substring
matching with the IP filter today? Because if we started using neutron,
that behavior would be broken. We've never actually documented the match
restrictions on the IP filter, but that's not a good reason to break it.

The use-case for us is that it helps us easily identify or find VMs which
we get any abuse reports for (or anything we see malicious traffic going
to/from). We usually search for an exact match of the IP address as we
are simply trying to perform a lookup of instance ID based on the IP
address. Regex matching isn't important in our case.

One option is to make this configurable such that deployments which rely
on the complicated pattern matching can just use the existing nova code
despite performance issues. However, that's not interoperable, I hate
config-driven API behavior, and it would mean maintaining two code paths in
nova, which is also terrible.

I was trying to think of a way to determine if the IP filter passed to
nova is basic or a complicated pattern match and let us decide that way,
but I'm not sure if there are good ways to detect that - maybe by simply
looking for special characters like (, ), - and $? But then there is [] and
we have an IPv6 filter, so that gets messy too...

For now I'd just like to know if people rely on the regex match or not.
Other ideas on how to handle this are appreciated.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 27, 2017 by Mohammed_Naser (3,860 points)   1 3
0 votes

Just the paranoid person in me, but is it safe to say that the filter
that you are showing here does not come from user text?

Ie these two lines don't come from a user input directly (without going
through some filter) do they?

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459

From reading it seems like perhaps they do come at least partially from
a user, so I am hoping that its not possible for a user to present a
'ip' that is really a complicated regex that takes a long time to
compile (and therefore can DOS the nova-api component); but I don't know
the surrounding code so I might be wrong...

Just wondering :-/

-Josh

Matt Riedemann wrote:
Nova has had this long-standing known performance issue if you're
filtering a large number of instances by IP. The instance IPs are stored
in a JSON blob in the database so we don't do filtering in SQL. We pull
the instances out of the database, deserialize the JSON and then apply a
regex filter match in the nova-api python code.

At the Queens PTG we talked about possible ways to fix this and came up
with this nova spec:

https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html

The idea is to have nova get ports from neutron and apply the IP filter
in neutron to whittle down the ports, then from that list of ports get
the instances to pull out of the nova database.

One issue that has come up with this is neutron does not currently
support regex filters when listing ports. There is an RFE for adding that:

https://bugs.launchpad.net/neutron/+bug/1718605

The proposed neutron implementation is to just do SQL LIKE substring
matching in the database.

However, one issue that has come up is that the compute API accepts a
python regex filter and uses re.match():

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469

At least one good thing about that is match() only matches from the
beginning of the string unlike search().

So for example I can filter on "192.16.*[1-5]$" if I wanted to, but
that's not going to work with just a LIKE substring filter in SQL.

The question is, does anyone actually do more than basic substring
matching with the IP filter today? Because if we started using neutron,
that behavior would be broken. We've never actually documented the match
restrictions on the IP filter, but that's not a good reason to break it.

One option is to make this configurable such that deployments which rely
on the complicated pattern matching can just use the existing nova code
despite performance issues. However, that's not interoperable, I hate
config-driven API behavior, and it would mean maintaining two code paths
in nova, which is also terrible.

I was trying to think of a way to determine if the IP filter passed to
nova is basic or a complicated pattern match and let us decide that way,
but I'm not sure if there are good ways to detect that - maybe by simply
looking for special characters like (, ), - and $? But then there is []
and we have an IPv6 filter, so that gets messy too...

For now I'd just like to know if people rely on the regex match or not.
Other ideas on how to handle this are appreciated.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 27, 2017 by harlowja_at_fastmail (16,200 points)   2 5 7
0 votes

Further things that someone may want to read/try (if the below is true),

https://en.wikipedia.org/wiki/ReDoS

Joshua Harlow wrote:
Just the paranoid person in me, but is it safe to say that the filter
that you are showing here does not come from user text?

Ie these two lines don't come from a user input directly (without going
through some filter) do they?

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459

From reading it seems like perhaps they do come at least partially from
a user, so I am hoping that its not possible for a user to present a
'ip' that is really a complicated regex that takes a long time to
compile (and therefore can DOS the nova-api component); but I don't know
the surrounding code so I might be wrong...

Just wondering :-/

-Josh

Matt Riedemann wrote:

Nova has had this long-standing known performance issue if you're
filtering a large number of instances by IP. The instance IPs are stored
in a JSON blob in the database so we don't do filtering in SQL. We pull
the instances out of the database, deserialize the JSON and then apply a
regex filter match in the nova-api python code.

At the Queens PTG we talked about possible ways to fix this and came up
with this nova spec:

https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html

The idea is to have nova get ports from neutron and apply the IP filter
in neutron to whittle down the ports, then from that list of ports get
the instances to pull out of the nova database.

One issue that has come up with this is neutron does not currently
support regex filters when listing ports. There is an RFE for adding
that:

https://bugs.launchpad.net/neutron/+bug/1718605

The proposed neutron implementation is to just do SQL LIKE substring
matching in the database.

However, one issue that has come up is that the compute API accepts a
python regex filter and uses re.match():

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469

At least one good thing about that is match() only matches from the
beginning of the string unlike search().

So for example I can filter on "192.16.*[1-5]$" if I wanted to, but
that's not going to work with just a LIKE substring filter in SQL.

The question is, does anyone actually do more than basic substring
matching with the IP filter today? Because if we started using neutron,
that behavior would be broken. We've never actually documented the match
restrictions on the IP filter, but that's not a good reason to break it.

One option is to make this configurable such that deployments which rely
on the complicated pattern matching can just use the existing nova code
despite performance issues. However, that's not interoperable, I hate
config-driven API behavior, and it would mean maintaining two code paths
in nova, which is also terrible.

I was trying to think of a way to determine if the IP filter passed to
nova is basic or a complicated pattern match and let us decide that way,
but I'm not sure if there are good ways to detect that - maybe by simply
looking for special characters like (, ), - and $? But then there is []
and we have an IPv6 filter, so that gets messy too...

For now I'd just like to know if people rely on the regex match or not.
Other ideas on how to handle this are appreciated.


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 27, 2017 by harlowja_at_fastmail (16,200 points)   2 5 7
0 votes

On 10/26/2017 10:56 PM, Joshua Harlow wrote:
Just the paranoid person in me, but is it safe to say that the filter
that you are showing here does not come from user text?

Ie these two lines don't come from a user input directly (without going
through some filter) do they?

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459

From reading it seems like perhaps they do come at least partially from
a user, so I am hoping that its not possible for a user to present a
'ip' that is really a complicated regex that takes a long time to
compile (and therefore can DOS the nova-api component); but I don't know
the surrounding code so I might be wrong...

Just wondering :-/

-Josh

We have schema validation on the ip filter but it's just checking that
it can actually compile it:

https://github.com/openstack/nova/blob/16.0.0/nova/api/validation/validators.py#L35

So yeah, probably a potential problem like you pointed out.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 28, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 5
0 votes

Matt Riedemann wrote:
On 10/26/2017 10:56 PM, Joshua Harlow wrote:

Just the paranoid person in me, but is it safe to say that the filter
that you are showing here does not come from user text?

Ie these two lines don't come from a user input directly (without
going through some filter) do they?

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459

From reading it seems like perhaps they do come at least partially
from a user, so I am hoping that its not possible for a user to
present a 'ip' that is really a complicated regex that takes a long
time to compile (and therefore can DOS the nova-api component); but I
don't know the surrounding code so I might be wrong...

Just wondering :-/

-Josh

We have schema validation on the ip filter but it's just checking that
it can actually compile it:

https://github.com/openstack/nova/blob/16.0.0/nova/api/validation/validators.py#L35

So yeah, probably a potential problem like you pointed out.

Ya, would seem so, especially if large user strings can get compiled.

Just a reference/useful tidbit but in the re.py module there is a
cache of the last 512 patterns compiled (suprise! i don't think a lot of
people know about it, ha), so assuming that users can present arbitrary
(and/or pretty big) input to the REST api of nova then that cache could
pretty large (depending on the allowable request max size) and/or could
also be thrashed pretty quickly (also note that regex compiling jumps
into C code afaik, so that probably locks up other greenthreads).

The cache layer fyi:

https://github.com/python/cpython/blob/3.6/Lib/re.py#L281-L312

Just a thought but it might just be a good idea to remove this validator
and never again do user provided regex patterns/input and such in
general (to avoid cache thrashing and various other ReDoS or ReDoS-like
problems).

-Josh


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Oct 28, 2017 by harlowja_at_fastmail (16,200 points)   2 5 7
0 votes

On 10/27/2017 1:23 PM, Matt Riedemann wrote:
Nova has had this long-standing known performance issue if you're
filtering a large number of instances by IP. The instance IPs are stored
in a JSON blob in the database so we don't do filtering in SQL. We pull
the instances out of the database, deserialize the JSON and then apply a
regex filter match in the nova-api python code.

At the Queens PTG we talked about possible ways to fix this and came up
with this nova spec:

https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html

The idea is to have nova get ports from neutron and apply the IP filter
in neutron to whittle down the ports, then from that list of ports get
the instances to pull out of the nova database.

One issue that has come up with this is neutron does not currently
support regex filters when listing ports. There is an RFE for adding that:

https://bugs.launchpad.net/neutron/+bug/1718605

The proposed neutron implementation is to just do SQL LIKE substring
matching in the database.

However, one issue that has come up is that the compute API accepts a
python regex filter and uses re.match():

https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469

At least one good thing about that is match() only matches from the
beginning of the string unlike search().

So for example I can filter on "192.16.*[1-5]$" if I wanted to, but
that's not going to work with just a LIKE substring filter in SQL.

The question is, does anyone actually do more than basic substring
matching with the IP filter today? Because if we started using neutron,
that behavior would be broken. We've never actually documented the match
restrictions on the IP filter, but that's not a good reason to break it.

One option is to make this configurable such that deployments which rely
on the complicated pattern matching can just use the existing nova code
despite performance issues. However, that's not interoperable, I hate
config-driven API behavior, and it would mean maintaining two code paths
in nova, which is also terrible.

I was trying to think of a way to determine if the IP filter passed to
nova is basic or a complicated pattern match and let us decide that way,
but I'm not sure if there are good ways to detect that - maybe by simply
looking for special characters like (, ), - and $? But then there is []
and we have an IPv6 filter, so that gets messy too...

For now I'd just like to know if people rely on the regex match or not.
Other ideas on how to handle this are appreciated.

To paraphrase the nova queens roadmap and checkpoint session at the
summit [1] we agreed to just do LIKE in-SQL regex filtering in Neutron.
The operators in the room (and from this thread in the ML) have said
they are doing exact IP filter matches, not regex. The LIKE regex
filtering in SQL still gives some flexibility with regex, but not the
crazy python re thing nova allows today (which is potentially an attack
vector).

So we'll move forward with those changes.

[1] https://etherpad.openstack.org/p/SYD-forum-nova-queens-update

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Nov 8, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 5
...