settingsLogin | Registersettings

[openstack-dev] [nova] [placement] [api] cache headers in placement service

0 votes

(I put [api] in the subject tags because this might be of interest
to a wider audience that cares about APIs.)

Long ago and far away I made this bug:

 https://bugs.launchpad.net/nova/+bug/1632852

"placement api responses should not be cacehable"

Today I've pushed up a WIP that demonstrates a way to address this:

 https://review.openstack.org/#/c/495380/

Before I get too far along with it, I'd like us to decide whether we
think it is worth doing and consider some of the questions it will
raise.

I think it is worth doing not just because it would be correct but
because without it, we cannot be assured that proxies or user agents
will not cache resource providers and other entities, and that would
lead to bizarre results.

At the same time, any entity you put on the web, according to the
RFCs[1], should have a last-modified header if it "can be reasonably
and consistently determined".

So my change above adds 'last-modified' and 'cache-control:
no-cache' to GET of /resourceproviders and
/resource
providers/{uuid} and proposes we do it for everything
else.

Should we?

If we do, some things to think about:

  • The related OVO will need the updatedat and createdat
    fields exposed. This is pretty easy to do with the
    NovaTimestampObject mixin. This doesn't need to require a object
    version bump because we don't do RPC with them.

  • Adding a response header violates interop guidelines, so this
    would mean a microversion bump that impacts all GET requests. In
    systems that currently use placement (the report client in nova,
    mostly) no attention is being paid to either of the headers being
    added, so there would be no need for motion on that side.

  • The current implementation of getting the last modified time for a
    collection of resources is intentionally naive and decoupled from
    other stuff. For very large result sets[3] this might be annoying,
    but since we are already doing plenty of traversing of long lists,
    it may not be a big deal. If it is we can incorporate getting the
    last modified time in the loop that serializes objects to JSON
    output.

I think we should. Generally speaking I think it is good form to
fulfil the expectations of HTTP. It helps make sure the HTTP APIs
work with the unknown client. Working with the unknown client is one
of the best reasons to have an HTTP API.k

[1] https://tools.ietf.org/html/rfc7232#section-2.2

[2] An argument could be made that this change is fixing a protocol
level bug, but I reckon that argument wouldn't fly with most people.

--
Chris Dent (⊙_⊙') https://anticdent.org/
freenode: cdent tw: @anticdent__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

asked Aug 21, 2017 in openstack-dev by cdent_plus_os_at_ant (12,800 points)   2 2 4

6 Responses

0 votes

Hi Chris, thanks for taking this on. Comments inline.

On 08/18/2017 01:23 PM, Chris Dent wrote:

(I put [api] in the subject tags because this might be of interest
to a wider audience that cares about APIs.)

Long ago and far away I made this bug:

 https://bugs.launchpad.net/nova/+bug/1632852

"placement api responses should not be cacehable"

Today I've pushed up a WIP that demonstrates a way to address this:

 https://review.openstack.org/#/c/495380/

Before I get too far along with it, I'd like us to decide whether we
think it is worth doing and consider some of the questions it will
raise.

I think it is worth doing not just because it would be correct but
because without it, we cannot be assured that proxies or user agents
will not cache resource providers and other entities, and that would
lead to bizarre results.

Yes, +1.

At the same time, any entity you put on the web, according to the
RFCs[1], should have a last-modified header if it "can be reasonably
and consistently determined".

So my change above adds 'last-modified' and 'cache-control:
no-cache' to GET of /resourceproviders and
/resource
providers/{uuid} and proposes we do it for everything
else.

Should we?

No. :) Not everything. In particular, I think both the GET
/resource_classes and GET /traits URIs are very cacheable and we
shouldn't disallow proxies from caching that content if they want to.

If we do, some things to think about:

  • The related OVO will need the updatedat and createdat
    fields exposed. This is pretty easy to do with the
    NovaTimestampObject mixin. This doesn't need to require a object
    version bump because we don't do RPC with them.

Technically, only the updated_at field would need to be exposed via the
OVO objects. But OK, sure. I'd even advocate a patch to OVO that would
bring in the NovaTimestampObject mixin. Just call it Timestamped or
something...

  • Adding a response header violates interop guidelines, so this
    would mean a microversion bump that impacts all GET requests. In
    systems that currently use placement (the report client in nova,
    mostly) no attention is being paid to either of the headers being
    added, so there would be no need for motion on that side.

Ack.

  • The current implementation of getting the last modified time for a
    collection of resources is intentionally naive and decoupled from
    other stuff. For very large result sets[3] this might be annoying,
    but since we are already doing plenty of traversing of long lists,
    it may not be a big deal. If it is we can incorporate getting the
    last modified time in the loop that serializes objects to JSON
    output.

I'm not sure what you're referring to above as "intentionally naive and
decoupled from other stuff"? Adding the updated_at field of the
underlying DB tables would be trivial -- maybe 10-15 lines total for
DB/object layer and REST API as well. Am I missing something?

Best,
-jay

I think we should. Generally speaking I think it is good form to
fulfil the expectations of HTTP. It helps make sure the HTTP APIs
work with the unknown client. Working with the unknown client is one
of the best reasons to have an HTTP API.k

[1] https://tools.ietf.org/html/rfc7232#section-2.2

[2] An argument could be made that this change is fixing a protocol
level bug, but I reckon that argument wouldn't fly with most people.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Aug 20, 2017 by Jay_Pipes (59,760 points)   3 10 14
0 votes

On Sun, 20 Aug 2017, Jay Pipes wrote:
On 08/18/2017 01:23 PM, Chris Dent wrote:

So my change above adds 'last-modified' and 'cache-control:
no-cache' to GET of /resourceproviders and
/resource
providers/{uuid} and proposes we do it for everything
else.

Should we?

No. :) Not everything. In particular, I think both the GET /resource_classes
and GET /traits URIs are very cacheable and we shouldn't disallow proxies
from caching that content if they want to.

Except that unless we have cache validation handling on the server
side, which we don't, then the "very cacheable" dependent on use
setting a max-age and coming to agreement over what the right
max-age seems unlikely. The simpler solution is to not cache.

If we do, some things to think about:

  • The related OVO will need the updatedat and createdat
    fields exposed. This is pretty easy to do with the
    NovaTimestampObject mixin. This doesn't need to require a object
    version bump because we don't do RPC with them.

Technically, only the updated_at field would need to be exposed via the OVO
objects. But OK, sure. I'd even advocate a patch to OVO that would bring in
the NovaTimestampObject mixin. Just call it Timestamped or something...

The way the database tables are currently set up, when a entity is
first created, createdat is set, and updatedat is null. Therefore,
on new entities, failing over to createdat when updatedat is null
is necessary.

The work I've done thus far has tried to have the smallest impact on
the database tables and the queries used to get at them. They're
already complex enough.

The entity tables already have createdat and updatedat columns.
Exposing those columns on the objects is a matter of adding the
mixin.

I agree that making a change on OVO to have a Timestamped would be
useful.

  • The current implementation of getting the last modified time for a
    collection of resources is intentionally naive and decoupled from
    other stuff. For very large result sets[3] this might be annoying,
    but since we are already doing plenty of traversing of long lists,
    it may not be a big deal. If it is we can incorporate getting the
    last modified time in the loop that serializes objects to JSON
    output.

I'm not sure what you're referring to above as "intentionally naive and
decoupled from other stuff"? Adding the updated_at field of the underlying DB
tables would be trivial -- maybe 10-15 lines total for DB/object layer and
REST API as well. Am I missing something?

By "other stuff" I mean two things:

  • the code is nova/objects/resource_provider.py
  • the serialization (to JSON) code in placement/handlers/*.py

For those requests that return collections, we could adapt the
queries used to retrieve those resources to find us the max
updated_at time during the query.

Or we could also do the same while traversing the list of objects to
create the JSON output.

I've chosen not to do the DB/object side changes because that is a
maze of many twisting passages, composed in fun ways. For those
situations where a list of native (e.g. /resourceproviders) objects
it return it is simply easier to extract the info later in the
process. For those situations there the returned data is composed on
the fly (e.g. /allocation
candidates, /usages) we want the
last-modified to be now() anyway, so it doesn't matter.

So the concern/question is around whether people deem it a problem
to traverse the list of objects a second time after already
traversing them a first time to create the JSON output. If so, we
can make the serialization loop have two purposes.

--
Chris Dent (⊙_⊙') https://anticdent.org/
freenode: cdent tw: @anticdent__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Aug 21, 2017 by cdent_plus_os_at_ant (12,800 points)   2 2 4
0 votes

On 08/21/2017 04:59 AM, Chris Dent wrote:
On Sun, 20 Aug 2017, Jay Pipes wrote:

On 08/18/2017 01:23 PM, Chris Dent wrote:

So my change above adds 'last-modified' and 'cache-control:
no-cache' to GET of /resourceproviders and
/resource
providers/{uuid} and proposes we do it for everything
else.

Should we?

No. :) Not everything. In particular, I think both the GET
/resource_classes and GET /traits URIs are very cacheable and we
shouldn't disallow proxies from caching that content if they want to.

Except that unless we have cache validation handling on the server
side, which we don't, then the "very cacheable" dependent on use
setting a max-age and coming to agreement over what the right
max-age seems unlikely. The simpler solution is to not cache.

We do have cache validation on the server side for resource classes. Any
time a resource class is added or deleted, we call RCCACHE.clear().
Couldn't we add a single attribute to the ResourceClassCache that
returns the last time the cache was reset?

But meh, you're right that the simpler solution is just to not do HTTP
caching.

If we do, some things to think about:

  • The related OVO will need the updatedat and createdat
    fields exposed. This is pretty easy to do with the
    NovaTimestampObject mixin. This doesn't need to require a object
    version bump because we don't do RPC with them.

Technically, only the updated_at field would need to be exposed via
the OVO objects. But OK, sure. I'd even advocate a patch to OVO that
would bring in the NovaTimestampObject mixin. Just call it Timestamped
or something...

The way the database tables are currently set up, when a entity is
first created, createdat is set, and updatedat is null. Therefore,
on new entities, failing over to createdat when updatedat is null
is necessary.

The work I've done thus far has tried to have the smallest impact on
the database tables and the queries used to get at them. They're
already complex enough.

The entity tables already have createdat and updatedat columns.
Exposing those columns on the objects is a matter of adding the
mixin.

Right.

I agree that making a change on OVO to have a Timestamped would be
useful.

  • The current implementation of getting the last modified time for a
    collection of resources is intentionally naive and decoupled from
    other stuff. For very large result sets[3] this might be annoying,
    but since we are already doing plenty of traversing of long lists,
    it may not be a big deal. If it is we can incorporate getting the
    last modified time in the loop that serializes objects to JSON
    output.

I'm not sure what you're referring to above as "intentionally naive
and decoupled from other stuff"? Adding the updated_at field of the
underlying DB tables would be trivial -- maybe 10-15 lines total for
DB/object layer and REST API as well. Am I missing something?

By "other stuff" I mean two things:

  • the code is nova/objects/resource_provider.py
  • the serialization (to JSON) code in placement/handlers/*.py

For those requests that return collections, we could adapt the
queries used to retrieve those resources to find us the max
updated_at time during the query.

No, I don't recommend that... just return the updatedat and createdat
fields.

Or we could also do the same while traversing the list of objects to
create the JSON output.

Yeah, that's fine.

I've chosen not to do the DB/object side changes because that is a
maze of many twisting passages, composed in fun ways. For those
situations where a list of native (e.g. /resourceproviders) objects
it return it is simply easier to extract the info later in the
process. For those situations there the returned data is composed on
the fly (e.g. /allocation
candidates, /usages) we want the
last-modified to be now() anyway, so it doesn't matter.

So the concern/question is around whether people deem it a problem
to traverse the list of objects a second time after already
traversing them a first time to create the JSON output. If so, we
can make the serialization loop have two purposes.

I have no problem calculating the last modified time in the
serialization loop.

But then again, if the default behaviour of HTTP is to never cache
anything unless some cache-related headers are present [1] and you
don't want proxies to cache any placement API information, why are we
changing anything at all anyway? If we left it alone (and continue not
sending Cache-Control headers for anything), the same exact result would
be achieved, no?

Best,
-jay

[1] https://tools.ietf.org/html/rfc7234#page-5


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Aug 21, 2017 by Jay_Pipes (59,760 points)   3 10 14
0 votes

On Mon, 21 Aug 2017, Jay Pipes wrote:
On 08/21/2017 04:59 AM, Chris Dent wrote:
We do have cache validation on the server side for resource classes. Any time
a resource class is added or deleted, we call RCCACHE.clear(). Couldn't we
add a single attribute to the ResourceClassCache that returns the last time
the cache was reset?

That's server side cache, of which the client side (or proxy side)
has no visibility. If we had etags, and were caching etag to resource
pairs when we sent out responses, we could then have a conditional
GET handler which checked etags, returning 304 on a cache hit.
At RCCACHE changes we could flush the etag cache.

But meh, you're right that the simpler solution is just to not do HTTP
caching.

'xactly

But then again, if the default behaviour of HTTP is to never cache anything
unless some cache-related headers are present [1] and you don't want
proxies to cache any placement API information, why are we changing anything
at all anyway? If we left it alone (and continue not sending Cache-Control
headers for anything), the same exact result would be achieved, no?

Essentially so we can put last-modified headers on things, which in
RFC speak we SHOULD do. And if we do that then we SHOULD make sure
no caching happens.

Also it seems like last-modified headers is a nice-to-have for that
"uknown client" I spoke up in the first message.

But as you correctly identify the immediate practical value to nova
is pretty small, which is one of the reasons I was looking for the
lightest-weight implementation.

--
Chris Dent (⊙_⊙') https://anticdent.org/
freenode: cdent tw: @anticdent__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Aug 21, 2017 by cdent_plus_os_at_ant (12,800 points)   2 2 4
0 votes

-----Original Message-----
From: Chris Dent [mailto:cdent+os@anticdent.org]
Sent: Monday, August 21, 2017 10:44 AM
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] [placement] [api] cache headers in
placement service

On Mon, 21 Aug 2017, Jay Pipes wrote:

On 08/21/2017 04:59 AM, Chris Dent wrote:
We do have cache validation on the server side for resource classes.
Any time a resource class is added or deleted, we call
RCCACHE.clear(). Couldn't we add a single attribute to the
ResourceClassCache that returns the last time the cache was reset?

That's server side cache, of which the client side (or proxy side) has
no visibility. If we had etags, and were caching etag to resource pairs
when we sent out responses, we could then have a conditional GET
handler which checked etags, returning 304 on a cache hit.
At RCCACHE changes we could flush the etag cache.
[Mooney, Sean K] I agree this is likely needed if caching is used. One of the changes
Intel would like to make is to transition the attestation server integration for
Trusted boot with our cloud integrity technologies to use traits on the compute node
Instead of a custom filter to attest that the server is trusted. In that case we
We would want to ensure that if we add or remove a trait for resource provider that
The cache is invalidated. So we would have to invalidate the etag or updated everytime
We update the tratis.

But meh, you're right that the simpler solution is just to not do
HTTP
caching.

'xactly

But then again, if the default behaviour of HTTP is to never cache
anything unless some cache-related headers are present [1] and you
don't want proxies to cache any placement API information, why are
we changing anything at all anyway? If we left it alone (and continue
not sending Cache-Control headers for anything), the same exact
result would be achieved, no?

Essentially so we can put last-modified headers on things, which in RFC
speak we SHOULD do. And if we do that then we SHOULD make sure no
caching happens.

Also it seems like last-modified headers is a nice-to-have for that
"uknown client" I spoke up in the first message.

But as you correctly identify the immediate practical value to nova is
pretty small, which is one of the reasons I was looking for the
lightest-weight implementation.

--
Chris Dent (⊙_⊙') https://anticdent.org/
freenode: cdent tw: @anticdent


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Aug 21, 2017 by Mooney,_Sean_K (3,580 points)   3 9
0 votes

On Mon, 21 Aug 2017, Chris Dent wrote:

Essentially so we can put last-modified headers on things, which in
RFC speak we SHOULD do. And if we do that then we SHOULD make sure
no caching happens.

For sake of completeness, I've gone ahead and proposed a spec for
this:

 https://review.openstack.org/#/c/496853/

--
Chris Dent (⊙_⊙') https://anticdent.org/
freenode: cdent tw: @anticdent__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Aug 23, 2017 by cdent_plus_os_at_ant (12,800 points)   2 2 4
...