settingsLogin | Registersettings

[openstack-dev] Is the pendulum swinging on PaaS layers?

0 votes

I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding orchestration to
the compute API, and even define a policy for it since it comes up so
much [1]. The stance is that the compute API should expose capabilities
that a higher-level orchestration service can stitch together for a more
fluid end user experience.

One simple example that comes up time and again is allowing a user to
pass volume type to the compute API when booting from volume such that
when nova creates the backing volume in Cinder, it passes through the
volume type. If you need a non-default volume type for boot from volume,
the way you do this today is first create the volume with said type in
Cinder and then provide that volume to the compute API when creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).

While talking about claims in the scheduler and a top-level conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller services
(nova-api, nova-conductor and nova-scheduler). Build retries is one such
up-call. CERN disables build retries, but others rely on them, because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked, "why
not just do away with build retries in nova altogether? If the scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."

But during several different Forum sessions, like user API improvements
[2] but also the cells v2 and claims in the scheduler sessions, I was
hearing about how operators only wanted to expose the base IaaS services
and APIs and end API users wanted to only use those, which means any
improvements in those APIs would have to be in the base APIs (nova,
cinder, etc). To me, that generally means any orchestration would have
to be baked into the compute API if you're not using Heat or something
similar.

Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?

[1] https://docs.openstack.org/developer/nova/project_scope.html#api-scope
[2]
https://etherpad.openstack.org/p/BOS-forum-openstack-user-api-improvements

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked May 26, 2017 in openstack-dev by mriedemos_at_gmail.c (15,720 points)   2 4 10

39 Responses

0 votes

Matt Riedemann wrote:
[...]
Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?

I feel like this is driven by a need for better UX on the IaaS APIs
layer (less calls, or more intuitive calls, as shown by shade UI). Even
if that IaaS layer is mostly accessed programmatically, it's not an
excuse for requiring 5 convoluted API calls and reading 5 pages of doc
for a basic action, when you could make it a single call.

So I'm not sure it's a recent change, or that it shows the demise of
PaaS layers, but that certainly shows that direct usage of IaaS APIs is
still a thing. If anything, the rise of application orchestration
frameworks like Kubernetes only separated the concerns -- provisioning
of application clusters might be done by someone else, but it still is
done by someone.

--
Thierry Carrez (ttx)


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Thierry_Carrez (57,480 points)   3 8 13
0 votes

On 05/18/2017 08:19 PM, Matt Riedemann wrote:
I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding orchestration to
the compute API, and even define a policy for it since it comes up so
much [1]. The stance is that the compute API should expose capabilities
that a higher-level orchestration service can stitch together for a more
fluid end user experience.

One simple example that comes up time and again is allowing a user to
pass volume type to the compute API when booting from volume such that
when nova creates the backing volume in Cinder, it passes through the
volume type. If you need a non-default volume type for boot from volume,
the way you do this today is first create the volume with said type in
Cinder and then provide that volume to the compute API when creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).

While talking about claims in the scheduler and a top-level conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller services
(nova-api, nova-conductor and nova-scheduler). Build retries is one such
up-call. CERN disables build retries, but others rely on them, because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked, "why
not just do away with build retries in nova altogether? If the scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."

But during several different Forum sessions, like user API improvements
[2] but also the cells v2 and claims in the scheduler sessions, I was
hearing about how operators only wanted to expose the base IaaS services
and APIs and end API users wanted to only use those, which means any
improvements in those APIs would have to be in the base APIs (nova,
cinder, etc). To me, that generally means any orchestration would have
to be baked into the compute API if you're not using Heat or something
similar.

Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?

Lots of people just want IaaS. See the fact that Google and Microsoft
both didn't offer it at first in their public clouds, and got pretty
marginal uptake while AWS ate the world. They have both reversed course
there.

The predictability of whether an intent is going to be fullfilled, and
"POST /servers" is definitely pretty clear intent, is directly related
to how much will are going to be willing to use the platform, build
tools for it. If it's much more complicated to build tooling on
OpenStack IaaS because that tooling needs to put everything in it's own
retry work queue, lots of folks will just give up.

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

On 19 May 2017 at 12:24, Sean Dague sean@dague.net wrote:

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?

--
Duncan Thomas


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Duncan_Thomas (16,160 points)   1 3 6
0 votes

On Fri, 19 May 2017, Duncan Thomas wrote:

On 19 May 2017 at 12:24, Sean Dague sean@dague.net wrote:

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?

This is what enamel was going to be, but we got stalled out because
of lack of resources and the usual raft of other commitments:

 https://github.com/jaypipes/enamel

--
Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/
freenode: cdent tw: @anticdent__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded May 19, 2017 by cdent_plus_os_at_ant (12,800 points)   2 2 5
0 votes

On 05/19/2017 09:04 AM, Chris Dent wrote:
On Fri, 19 May 2017, Duncan Thomas wrote:

On 19 May 2017 at 12:24, Sean Dague sean@dague.net wrote:

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?

This is what enamel was going to be, but we got stalled out because
of lack of resources and the usual raft of other commitments:

https://github.com/jaypipes/enamel

There was a conversation in the Cell v2 discussion around searchlight
that puts me more firmly in the anti enamel camp. Because of some
complexities around server list, Nova was planning on using Searchlight
to provide an efficient backend.

Q: Who in this room is running ELK already in their environment?
A: 100% of operators in room

Q: Who would be ok with standing up Searchlight for this?
A: 0% of operators in the room

We've now got an ecosystem that understands how to talk to our APIs
(yay! -
https://docs.google.com/presentation/d/1WAWHrVw8-u6XC7AG9ANdre8-Su0a3fdI-scjny3QOnk/pub?slide=id.g1d9d78a72b_0_0)
so saying "you need to also run this other service to actually do the
thing you want, and redo all your applications, and 3rd party SDKs" is
just weird.

And, yes, this is definitely a slider, and no I don't want Instance HA
in Nova. But we felt that "get-me-a-network" was important enough a user
experience to bake that in and stop poking users with sticks. And trying
hard to complete an expressed intent "POST /server" seems like it falls
on the line. Especially if the user received a conditional success (202).

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

On 18/05/17 20:19, Matt Riedemann wrote:
I just wanted to blurt this out since it hit me a few times at the
summit, and see if I'm misreading the rooms.

For the last few years, Nova has pushed back on adding orchestration to
the compute API, and even define a policy for it since it comes up so
much [1]. The stance is that the compute API should expose capabilities
that a higher-level orchestration service can stitch together for a more
fluid end user experience.

I think this is a wise policy.

One simple example that comes up time and again is allowing a user to
pass volume type to the compute API when booting from volume such that
when nova creates the backing volume in Cinder, it passes through the
volume type. If you need a non-default volume type for boot from volume,
the way you do this today is first create the volume with said type in
Cinder and then provide that volume to the compute API when creating the
server. However, people claim that is bad UX or hard for users to
understand, something like that (at least from a command line, I assume
Horizon hides this, and basic users should probably be using Horizon
anyway right?).

As always, there's a trade-off between simplicity and flexibility. I can
certainly understand the logic in wanting to make the simple stuff
simple. But users also need to be able to progress from simple stuff to
more complex stuff without having to give up and start over. There's a
danger of leading them down the garden path.

While talking about claims in the scheduler and a top-level conductor
for cells v2 deployments, we've talked about the desire to eliminate
"up-calls" from the compute service to the top-level controller services
(nova-api, nova-conductor and nova-scheduler). Build retries is one such
up-call. CERN disables build retries, but others rely on them, because
of how racy claims in the computes are (that's another story and why
we're working on fixing it). While talking about this, we asked, "why
not just do away with build retries in nova altogether? If the scheduler
picks a host and the build fails, it fails, and you have to
retry/rebuild/delete/recreate from a top-level service."

(FWIW Heat does this for you already.)

But during several different Forum sessions, like user API improvements
[2] but also the cells v2 and claims in the scheduler sessions, I was
hearing about how operators only wanted to expose the base IaaS services
and APIs and end API users wanted to only use those, which means any
improvements in those APIs would have to be in the base APIs (nova,
cinder, etc). To me, that generally means any orchestration would have
to be baked into the compute API if you're not using Heat or something
similar.

The problem is that orchestration done inside APIs is very easy to do
badly in ways that cause lots of downstream pain for users and external
orchestrators. For example, Nova already does some orchestration: it
creates a Neutron port for a server if you don't specify one. (And then
promptly forgets that it has done so.) There is literally an entire
inner platform, an orchestrator within an orchestrator, inside Heat to
try to manage the fallout from this. And the inner platform shares none
of the elegance, such as it is, of Heat itself, but is rather a
collection of cobbled-together hacks to deal with the seemingly infinite
explosion of edge cases that we kept running into over a period of at
least 5 releases.

The get-me-a-network thing is... better, but there's no provision for
changes after the server is created, which means we have to copy-paste
the Nova implementation into Heat to deal with update.[1] Which sounds
like a maintenance nightmare in the making. That seems to be a common
mistake: to assume that once users create something they'll never need
to touch it again, except to delete it when they're done.

Don't even get me started on Neutron.[2]

Any orchestration that is done behind-the-scenes needs to be done
superbly well, provide transparency for external orchestration tools
that need to hook in to the data flow, and should be developed in
consultation with potential consumers like Shade and Heat.

Am I missing the point, or is the pendulum really swinging away from
PaaS layer services which abstract the dirty details of the lower-level
IaaS APIs? Or was this always something people wanted and I've just
never made the connection until now?

(Aside: can we stop using the term 'PaaS' to refer to "everything that
Nova doesn't do"? This habit is not helping us to communicate clearly.)

cheers,
Zane.

[1] https://review.openstack.org/#/c/407328/
[2]
http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Zane_Bitter (21,640 points)   4 6 9
0 votes

On 05/19/2017 07:18 AM, Sean Dague wrote:

There was a conversation in the Cell v2 discussion around searchlight
that puts me more firmly in the anti enamel camp. Because of some
complexities around server list, Nova was planning on using Searchlight
to provide an efficient backend.

Q: Who in this room is running ELK already in their environment?
A: 100% of operators in room

Q: Who would be ok with standing up Searchlight for this?
A: 0% of operators in the room

We've now got an ecosystem that understands how to talk to our APIs
(yay! -
https://docs.google.com/presentation/d/1WAWHrVw8-u6XC7AG9ANdre8-Su0a3fdI-scjny3QOnk/pub?slide=id.g1d9d78a72b_0_0)
so saying "you need to also run this other service to actually do the
thing you want, and redo all your applications, and 3rd party SDKs" is
just weird.

And, yes, this is definitely a slider, and no I don't want Instance HA
in Nova. But we felt that "get-me-a-network" was important enough a user
experience to bake that in and stop poking users with sticks. And trying
hard to complete an expressed intent "POST /server" seems like it falls
on the line. Especially if the user received a conditional success (202).

A while back I suggested adding the vif-model as an attribute on the network
during a nova boot request, and we were shot down because "that should be done
in neutron".

I have some sympathy for this argument, but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids. The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of simple
things can be done directly in a nova boot command, but for more complicated
stuff you have to go use these other commands". I think there's an argument to
be made that it would be better to be consistent even for the simple things.

Chris


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Chris_Friesen (20,420 points)   3 16 24
0 votes

On Fri, May 19, 2017, at 05:59 AM, Duncan Thomas wrote:
On 19 May 2017 at 12:24, Sean Dague sean@dague.net wrote:

I do get the concerns of extra logic in Nova, but the decision to break
up the working compute with network and storage problem space across 3
services and APIs doesn't mean we shouldn't still make it easy to
express some pretty basic and common intents.

Given that we've similar needs for retries and race avoidance in and
between glance, nova, cinder and neutron, and a need or orchestrate
between at least these three (arguably other infrastructure projects
too, I'm not trying to get into specifics), maybe the answer is to put
that logic in a new service, that talks to those four, and provides a
nice simple API, while allowing the cinder, nova etc APIs to remove
things like internal retries?

The big issue with trying to solve the problem this way is that various
clouds won't deploy this service then your users are stuck with the
"base" APIs anyway or deploying this service themselves. This is mostly
ok until you realize that we rarely build services to run "on" cloud
rather than "in" cloud so I as the user can't sanely deploy a new
service this way, and even if I can I'm stuck deploying it for the 6
clouds and 15 regions (numbers not exact) because even more rarely do we
write software that is multicloud/region aware.

We need to be very careful if this is the path we take because it often
doesn't actually make the user experience better.

Clark


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Clark_Boylan (8,800 points)   1 2 4
0 votes

On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
chris.friesen@windriver.com wrote:
..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids. The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of
simple things can be done directly in a nova boot command, but for more
complicated stuff you have to go use these other commands". I think there's
an argument to be made that it would be better to be consistent even for the
simple things.

cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers. I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.

First and foremost, we need to have the primitive operations that get
composed into the higher-level ones available. Just picking "POST
/server" as an example, we do not have that today. Chris mentions
above the low-level version should take IDs for all of the associated
resources and no magic happening behind the scenes. I think this
should be our top priority, everything else builds on top of that, via
either in-service APIs or proxies or library wrappers, whatever a) can
get implemented and b) makes sense for the use case.

dt

[BTW, I made this same type of proposal for the OpenStack SDK a few
years ago and it went unmerged, so at some level folks do not agree
this is necessary. I look now at what the Shade folk are doing about
building low-level REST layer that they then compose and wish I had
been more persistent then.]

[0] https://github.com/jaypipes/enamel
[1] http://git.openstack.org/cgit/openstack/oaktree
--

Dean Troyer
dtroyer@gmail.com


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Dean_Troyer (13,100 points)   1 3 3
0 votes

On 05/19/2017 01:38 PM, Dean Troyer wrote:
On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
chris.friesen@windriver.com wrote:

..., but it seems to me that the logical
extension of that is to expose simple orthogonal APIs where the nova boot
request should only take neutron port ids and cinder volume ids. The actual
setup of those ports/volumes would be done by neutron and cinder.

It seems somewhat arbitrary to say "for historical reasons this subset of
simple things can be done directly in a nova boot command, but for more
complicated stuff you have to go use these other commands". I think there's
an argument to be made that it would be better to be consistent even for the
simple things.

cdent mentioned enamel[0] above, and there is also oaktree[1], both of
which are wrapper/proxy services in front of existing OpenStack APIs.
I don't know enough about enamel yet, but one of the things I like
about oaktree is that it is not required to be deployed by the cloud
operator to be useful, I could set it up and proxy Rax and/or
CityCloud and/or mtreinish's closet cloud equally well.

The fact that these exist, and things like shade itself, are clues
that we are not meeting the needs of API consumers. I don't think
anyone disagrees with that; let me know if you do and I'll update my
thoughts.

It's fine to have other ways to consume things. I feel like "make
OpenStack easier to use by requiring you install a client side API
server for your own requests" misses the point of the easier part. It's
cool you can do it as a power user. It's cool things like Heat exist for
people that don't want to write API calls (and just do templates). But
it's also not helping on the number of pieces of complexity to manage in
OpenStack to have a workable cloud.

I consider those things duct tape, leading use to the eventually
consistent place where we actually do that work internally. Because,
having seen with the ec2-api proxy, the moment you get beyond trivial
mapping, you now end up with a complex state tracking system, that's
going to need to be highly available, and replicate a bunch of your data
to be performent, and then have inconsistency issues, because a user
deployed API proxy can't have access to the notification bus, and... boom.

You end up replicating the Ceilometer issue where there was a break down
in getting needs expressed / implemented, and the result was a service
doing heavy polling of other APIs (because that's the only way it could
get the data it needed). Literally increasing the load on the API
surfaces by a factor of 10
(http://superuser.openstack.org/articles/cern-cloud-architecture-update/
last graph). That was an anti pattern. We should have gotten to the
bottom of the mismatches and communication issues early on, because the
end state we all inflicted on users to get a totally reasonable set of
features, was not good. Please lets not do this again.

These should be used as ways to experiment with the kinds of interfaces
we want cheaply, then take them back into services (which is a more
expensive process involving compatibility stories, deeper documentation,
performance implications, and the like), not an end game on their own.

First and foremost, we need to have the primitive operations that get
composed into the higher-level ones available. Just picking "POST
/server" as an example, we do not have that today. Chris mentions
above the low-level version should take IDs for all of the associated
resources and no magic happening behind the scenes. I think this
should be our top priority, everything else builds on top of that, via
either in-service APIs or proxies or library wrappers, whatever a) can
get implemented and b) makes sense for the use case.

You can get the behavior. It also has other behaviors. I'm not sure any
user has actually argued for "please make me do more rest calls to
create a server".

Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
want my server to build", or "I'd like Nova to build a volume for me"
are very odd things to call PaaS. I think of PaaS as "here is a ruby on
rails app, provision me a db for it, and make it go". Heroku style.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2017 by Sean_Dague (66,200 points)   4 8 14
...