Wading in a bit late as I've been off-list for a while, but I have thoughts here.
Excerpts from Jay Pipes's message of 2017-09-13 13:44:55 -0400:
On 09/12/2017 06:53 PM, Boris Pavlovic wrote:
Great intiative, unfortunately I wasn't able to attend it, however I
have some thoughts...
You can't simplify OpenStack just by fixing few issues that are
described in the etherpad mostly..
TC should work on shrinking the OpenStack use cases and moving towards
the product (box) complete solution instead of pieces of bunch barely
OpenStack is not a product. It's a collection of projects that represent
a toolkit for various cloud-computing functionality.
I think Boris was suggesting that making it a product would simplify it.
I believe there is some effort under way to try this, but my brain
has ceased to remember what that effort is called or how it is being
implemented. Something about common use cases and the exact mix of
projects + configuration to get there, and testing it? Help?
*Simple things to improve: *
/This is going to allow community to work together, and actually get
feedback in standard way, and incrementally improve quality. /
1) There should be one and only one:
1.1) deployment/packaging(may be docker) upgrade mechanism used by
Good luck with that :) The likelihood of the deployer/packager community
agreeing on a single solution is zero.
I think Boris is suggesting that the OpenStack development community
pick one to use, not the packaging and deployer community. The
only common thing dev has in this area is devstack, and that
has allowed dev to largely ignore issues they create because
they're not feeling the pain of the average user who is using
puppet/chef/ansible/tripleo/kolla/in-house-magic to deploy.
1.2) monitoring/logging/tracing mechanism used by everybody
Also close to zero chance of agreeing on a single solution. Better to
focus instead on ensuring various service projects are monitorable and
I'm less enthused about this one as well. Monitoring, alerting, defining
business rules for what is broken and what isn't are very org-specific
I also don't think OpenStack fails at this and there is plenty exposed
in clear ways for monitors to be created.
1.3) way to configure all services (e.g. k8 etcd way)
Are you referring to the way to configure k8s services or the way to
configure/setup an application that is running on k8s? If the former,
then there is not a single way of configuring k8s services. If the
latter, there isn't a single way of configuring that either. In fact,
despite Helm being a popular new entrant to the k8s application package
format discussion, k8s itself is decidedly not opinionated about how
an application is configured. Use a CMDB, use Helm, use env variables,
use confd, use whatever. k8s doesn't care.
We do have one way to configure things. Well.. two.
*) Startup-time things are configured in config files.
*) Run-time changable things are in databases fronted by admin APIs/tools.
2) Projects must have standardize interface that allows these projects
to use them in same way.
Give examples of services that communicate over non-standard
interfaces. I don't know of any.
Agreed here too. I'd like to see a more clear separation between nova,
neutron, and cinder on the hypervisor, but the way they're coupled now
3) Testing & R&D should be performed only against this standard deployment
Sorry, this is laughable. There will never be a standard deployment
because there are infinite use cases that infrastructure supports.
Your definition of what works for GoDaddy is decidedly different from
someone else's definition of what works for them.
If there were a few well defined product definitions, there could be. It's
not laughable at all to me. devstack and the configs it creates are useful
for lightweight testing, but they're not necessarily representative of
the standard makeup of real-world clouds.
*Hard things to improve: *
OpenStack projects were split in far from ideal way, which leads to
bunch of gaps that we have now:
1.1) Code & functional duplications: Quotas, Schedulers, Reservations,
Health checks, Loggign, Tracing, ....
There is certainly code duplication in some areas, yes.
I feel like this de-duplication has been moving at the slow-but-consistent
pace anyone can hope for since it was noticed and oslo was created.
It's now at the things that are really hard to de-dupe like quotas and policy.
1.2) Non optimal workflows (booting VM takes 400 DB requests) because
data is stored in Cinder,Nova,Neutron....
Sorry, I call bullshit on this. It does not take 400 DB requests to boot
a VM. Also: the DB is not at all the bottleneck in the VM launch
process. You've been saying it is for years with no justification to
back you up. Pointing to a Rally scenario that doesn't reflect a
real-world usage of OpenStack services isn't useful.
Separation of concerns often beats performance anyway. I do think this
was just Boris's optimization muscle flexing a little too hard.
1.3) Lack of resources (as every project is doing again and again same
work about same parts)
Provide specific examples please.
Glance is constantly teetering on the brink of being unmaintained. There
are, in fact, hundreds of open bugs in Nova, with 47 marked as High
importance. Though IMO, that is just the way software works: if we had
enough people to fix everything, we'd think of more things to break first.
What we can do:
*1) Simplify internal communication *
1.1) Instead of AMQP for internal communication inside projects use just
HTTP, load balancing & retries.
Prove to me that this would solve a problem. First describe what the
problem is, then show me that using AMQP is the source of that problem,
then show me that using HTTP requests would solve that problem.
RabbitMQ is a bottleneck for all projects that use it. There aren't any
really well tested alternatives, and projects that need the scale are
turning to things like Cellsv2 to work around this problem.
Lately I've been wondering more why we don't just replace MySQL+RabbitMQ with
something like etcd3 or zookeeper. They notify you when things change and
offer enough scalability and resilience to failure that it just might work
without sharding being necessary below the thousands-of-hypervisors mark.
But, R&D time is short, so I accept our RabbitMQ overlord until such
time as I can plan a peaceful coup.
*2) Use API Gateway pattern *
3.1) Provide to use high level API one IP address with one client
3.2) Allows to significant reduce load on Keystone because tokens are
checked only in API gateway
3.3) Simplifies communication between projects (they are now in trusted
network, no need to check token)
Why is this a problem for OpenStack projects to deal with? If you want a
single IP address for all APIs that your users consume, then simply
deploy all the public-facing services on a single set of web servers and
make each service's root endpoint be a subresource on the root IP/DNS name.
We effectively get this from a user perspective with the single auth_url +
catalog. That said, it might simplify things for users if we didn't need
the catalog part on the user end. Just answer requests to any API from
one place that finds the thing int he catalog for you.
*3) Fix the OpenStack split *
3.1) Move common functionality to separated internal services:
Scheduling, Logging, Monitoring, Tracing, Quotas, Reservations (it would
be even better if this thing would have more or less monolithic
Yes, let's definitely go the opposite direction of microservices and
loosely coupled domains which is the best practices of software
development over the last two decades. While we're at it, let's rewrite
OpenStack projects in COBOL.
Actually I think he argued for micro-services. "... separated internal
services." and then argued that a monolithic implementation would
I personally like separation of concerns, and thus, microservices.
3.2) Somehow deal with defragmentation of resources e.g. VM Volumes and
Networks data which is heavily connected.
How are these things connected?
4) Don't be afraid to break things
Maybe it's time for OpenStack 2:
- In any case most of people provide API on top of OpenStack for usage
- In any case there is no standard and easy way to upgrade
So basically we are not losing anything even if we do not backward
compatible changes and rethink completely architecture and API.
Awesome news. I will keep this in mind when users (like GoDaddy) ask
Nova to never break anything ever and keep behaviour like scheduler
retries that represent giant technical debt.
Please don't break anything in OpenStack 1.
Please lets break everything when we start OpenStack 2, but
provide compatibility layers and legacy services for those who are