settingsLogin | Registersettings

[openstack-dev] [tc] revised Postgresql deprecation patch for governance

0 votes

We had a forum session in Boston on Postgresql and out of that agreed to
the following steps forward:

  1. explicitly warn in operator facing documentation that Postgresql is
    less supported than MySQL. This was deemed better than just removing
    documentation, because when people see Postgresql files in tree they'll
    make assumptions (at least one set of operators did).

  2. Suse is in process of investigating migration from PG to Gallera for
    future versions of their OpenStack product. They'll make their findings
    and tooling open to help determine how burdensome this kind of
    transition would be for folks.

After those findings, we can come back with any next steps (or just
leave it as good enough there).

The TC governance patch is updated here -
https://review.openstack.org/#/c/427880/ - or if there are other
discussion questions feel free to respond to this thread.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked May 23, 2017 in openstack-dev by Sean_Dague (66,200 points)   4 8 14

14 Responses

0 votes

On 05/15/2017 07:16 AM, Sean Dague wrote:
We had a forum session in Boston on Postgresql and out of that agreed to
the following steps forward:

  1. explicitly warn in operator facing documentation that Postgresql is
    less supported than MySQL. This was deemed better than just removing
    documentation, because when people see Postgresql files in tree they'll
    make assumptions (at least one set of operators did).

  2. Suse is in process of investigating migration from PG to Gallera for
    future versions of their OpenStack product. They'll make their findings
    and tooling open to help determine how burdensome this kind of
    transition would be for folks.

After those findings, we can come back with any next steps (or just
leave it as good enough there).

The TC governance patch is updated here -
https://review.openstack.org/#/c/427880/ - or if there are other
discussion questions feel free to respond to this thread.

In the interest of building summaries of progress, as there has been a
bunch of lively discussion on #openstack-dev today, there is a new
revision out there - https://review.openstack.org/#/c/427880/.

Some of the concerns/feedback has been "please describe things that are
harder by this being an abstraction", so examples are provided.

A statement around support was also put in there, because support only
meant QA jobs, or only developers for some folks. I think it's important
to ensure we paint the whole picture with how people get support in an
Open Source project.

There seems to be general agreement that we need to be more honest with
users, and that we've effectively been lying to them.

I feel like the current sticking points come down to whether:

  • it's important that the operator community largely is already in one
    camp or not
  • future items listed that are harder are important enough to justify a
    strict trade off here
  • it's ok to have the proposal have a firm lean in tone, even though
    it's set of concrete actions are pretty reversible and don't commit to
    future removal of postgresql

Also, as I stated on IRC, if some set of individuals came through and
solved all the future problems on the list for us as a community, my
care on how many DBs supported would drastically decrease. Because its
the fact that it's costing us solving real problems that we want to
solve (by making them too complex for anyone to take on), is my key
concern. For folks asking the question about what they could do to make
pg a first class citizen, that's a pretty good starting point.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 17, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

On 05/17/2017 02:38 PM, Sean Dague wrote:

Some of the concerns/feedback has been "please describe things that are
harder by this being an abstraction", so examples are provided.

so let's go through this list:

  • OpenStack services taking a more active role in managing the DBMS

mmmm, "managing" is vague to me, are we referring to the database
service itself, e.g. starting / stopping / configuring? installers
like tripleo do this now, pacemaker is standard in HA for control of
services, I think I need some background here as to what the more active
role would look like.

  • The ability to have zero down time upgrade for services such as
    Keystone.

So "zero down time upgrades" seems to have broken into:

  • "expand / contract with the code carefully dancing around the
    existence of two schema concepts simultaneously", e.g. nova, neutron.
    AFAIK there is no particular issue supporting multiple backends on this
    because we use alembic or sqlalchemy-migrate to abstract away basic
    ALTER TABLE types of feature.

  • "expand / contract using server side triggers to reconcile the two
    schema concepts", e.g. keystone. This is more difficult because there
    is currently no "trigger" abstraction layer. Triggers represent more
    of an imperative programming model vs. typical SQL, which is why I've
    not taken on trying to build a one-size-fits-all abstraction for this in
    upstream Alembic or SQLAlchemy. However, it is feasible to build a
    "one-size-that-fits-openstack-online-upgrades" abstraction. I was
    trying to gauge interest in helping to create this back in the
    "triggers" thread, in my note at
    http://lists.openstack.org/pipermail/openstack-dev/2016-August/102345.html,
    which also referred to some very raw initial code examples. However, it
    received strong pushback from a wide range of openstack veterans, which
    led me to believe this was not a thing that was happening. Apparently
    Keystone has gone ahead and used triggers anyway, however I was not
    pulled into that process. But if triggers are to be "blessed" by at
    least some projects, I can likely work on this problem for MySQL /
    Postgresql agnosticism. If keystone is using triggers right now for
    online upgrades, I would ask, are they currently working on Postgresql
    as well with PG-specific triggers, or does Postgresql degrade into a
    "non-online" migration scenario if you're running Keystone?

  • Consistent UTF8 4 & 5 byte support in our APIs

"5 byte support" appears to refer to utf-8's ability to be...well a
total of 6 bytes. But in practice, unicode itself only needs 4 bytes
and that is as far as any database supports right now since they target
unicode (see https://en.wikipedia.org/wiki/UTF-8#Description). That's
all any database we're talking about supports at most. So...lets assume
this means four bytes.

From the perspective of database-agnosticism with regards to database
and driver support for non-ascii characters, this problem has been
solved by SQLAlchemy well before Python 3 existed when many DBAPIs would
literally crash if they received a u'' string, and the rest of them
would churn out garbage; SQLAlchemy implemented a full encode/decode
layer on top of the Python DBAPI to fix this. The situation is vastly
improved now that all DBAPIs support unicode natively.

However, on the MySQL side there is this complexity that their utf-8
support is a 3-byte only storage model, and you have to use utf8mb4 if
you want the four byte model. I'm not sure right now what projects are
specifically hitting issues related to this.

Postgresql doesn't have such a limitation. If your Postgresql server
or specific database is set up for utf-8 (which should be the case),
then you get full utf-8 character set support.

So I don't see the problem of "consistent utf8 support" having much to
do with whether or not we support Posgtresql - you of course need your
"CREATE DATABASE" to include the utf8 charset like we do on MySQL, but
that's it.

  • The requirement that Postgresql libraries are compiled for new users
    trying to just run unit tests (no equiv is true for mysql because of
    the pure python driver).

I would suggest that new developers for whom the presence of things like
postgresql client libraries is a challenge (but somehow they are running
a MySQL server for their pure python driver to talk to?) don't actually
have to worry about running the tests against Postgresql, this is how
the "opportunistic" testing model in oslo.db has always worked; it only
runs for the backends that you have set up.

Also, openstack got all the way through Kilo approximately using the
native python-MySQL driver which required a compiled client library as
well as the MySQL dependencies be installed. The psycopg2 driver has a
ton of whl's up on pypi (https://pypi.python.org/pypi/psycopg2) and all
linux distros supply it as a package in any case, so an actual "compile"
should not be needed. Also, this is Openstack....it's basic existence
is a kind of this vastly enormous glue between thousands of native
(ultimately C-compiled) libraries and packages, and it runs only
on...linux. So this is just a weird point to bring up. Seems a
little red herring to me.

  • Consistency around case sensitivity collation defaults that lead to
    strange bugs around searching/updating names in resources.

Finally, a real issue-ish thing that requires a resolution. So the
good news here is that while MySQL is defaulting to case-insensitive
collations (which I assume we like) but Postgresql has almost no support
for case-insensitive collations (unless you use this odd CITEXT
datatype), it is possible to make a case-sensitive collation style
become case-insensitive at SQL query time much more easily than it would
be to go the other way.

SQLAlchemy already has some case-insensitive operators, most notably
"ilike()", which is a case-insensitive "LIKE" that is backend agnostic.
If these search queries are just using LIKE then they only need use
ilike() from SQLAlchemy instead of like().

If we are talking about the full range of operators like ==, !=,
etc., and/or if we are also concerned that developers may use like()
when they really need to use ilike(), the issue can be addressed at the
typing level as well. Using SQLAlchemy, a String datatype that
guarantees case-insensitive comparisons is straightforward to construct.
This would be a Python side replacement for the String type, and
possibly Text, Unicode, etc. as needed. It does not imply any
database schema migrations. The hypothetical CaseInsensitiveString
would override all the comparison operators to ensure that on the
Postgresql (or other case-sensitive) backend, both sides of the
expression are embedded within the SQL LOWER() function, so that these
comparisons act case insensitively. The trick then is to get the
downstream projects to use this type (which would be in oslo.db) in
their models, which is the usual herding cats job. But this is a pretty
solvable problem.

so to sum up:

  1. "more active management role" - not totally sure what that refers to
    beyond what is already present

  2. "zero downtime upgrades" - to the extent that projects use server
    side constructs like triggers, effort is needed to produce
    implementations for both backends. I can help with this effort but the
    downstream projects need to care and be open to reviewing / merging
    gerrits and things. Not sure what the state of trigger-based online
    upgrades is for Postgresql right now, or if there is a degradation mode
    present in Keystone w/ Postgresql.

  3. utf-8 - no problem

  4. native clients - red herring ?

  5. case-insensitive - both query-level and model-level solutions exist
    that work within our existing frameworks and patterns

A statement around support was also put in there, because support only
meant QA jobs, or only developers for some folks. I think it's important
to ensure we paint the whole picture with how people get support in an
Open Source project.

There seems to be general agreement that we need to be more honest with
users, and that we've effectively been lying to them.

I feel like the current sticking points come down to whether:

  • it's important that the operator community largely is already in one
    camp or not
  • future items listed that are harder are important enough to justify a
    strict trade off here
  • it's ok to have the proposal have a firm lean in tone, even though
    it's set of concrete actions are pretty reversible and don't commit to
    future removal of postgresql

Also, as I stated on IRC, if some set of individuals came through and
solved all the future problems on the list for us as a community, my
care on how many DBs supported would drastically decrease. Because its
the fact that it's costing us solving real problems that we want to
solve (by making them too complex for anyone to take on), is my key
concern. For folks asking the question about what they could do to make
pg a first class citizen, that's a pretty good starting point.

-Sean


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 18, 2017 by Mike_Bayer (15,260 points)   1 5 6
0 votes

On 05/18/2017 01:02 PM, Mike Bayer wrote:

On 05/17/2017 02:38 PM, Sean Dague wrote:

Some of the concerns/feedback has been "please describe things that are
harder by this being an abstraction", so examples are provided.

so let's go through this list:

  • OpenStack services taking a more active role in managing the DBMS

mmmm, "managing" is vague to me, are we referring to the database
service itself, e.g. starting / stopping / configuring? installers
like tripleo do this now, pacemaker is standard in HA for control of
services, I think I need some background here as to what the more active
role would look like.

I will leave that one for mordred, it was his concern.

  • The ability to have zero down time upgrade for services such as
    Keystone.

So "zero down time upgrades" seems to have broken into:

  • "expand / contract with the code carefully dancing around the
    existence of two schema concepts simultaneously", e.g. nova, neutron.
    AFAIK there is no particular issue supporting multiple backends on this
    because we use alembic or sqlalchemy-migrate to abstract away basic
    ALTER TABLE types of feature.

  • "expand / contract using server side triggers to reconcile the two
    schema concepts", e.g. keystone. This is more difficult because there
    is currently no "trigger" abstraction layer. Triggers represent more
    of an imperative programming model vs. typical SQL, which is why I've
    not taken on trying to build a one-size-fits-all abstraction for this in
    upstream Alembic or SQLAlchemy. However, it is feasible to build a
    "one-size-that-fits-openstack-online-upgrades" abstraction. I was
    trying to gauge interest in helping to create this back in the
    "triggers" thread, in my note at
    http://lists.openstack.org/pipermail/openstack-dev/2016-August/102345.html,
    which also referred to some very raw initial code examples. However, it
    received strong pushback from a wide range of openstack veterans, which
    led me to believe this was not a thing that was happening. Apparently
    Keystone has gone ahead and used triggers anyway, however I was not
    pulled into that process. But if triggers are to be "blessed" by at
    least some projects, I can likely work on this problem for MySQL /
    Postgresql agnosticism. If keystone is using triggers right now for
    online upgrades, I would ask, are they currently working on Postgresql
    as well with PG-specific triggers, or does Postgresql degrade into a
    "non-online" migration scenario if you're running Keystone?

This is the triggers conversation, which while I have issues with, is
the only path forward now if you are doing keystone in a load balancer
and need to retain HA through the process.

No one is looking at pg here. And yes, everything not mysql would just
have to take the minimal expand / contract downtime. Data services like
Keystone / Glance whose data is their REST API definitely have different
concerns than Nova dropping it's control plane for 30s to recycle code
and apply db schema tweaks.

  • Consistent UTF8 4 & 5 byte support in our APIs

"5 byte support" appears to refer to utf-8's ability to be...well a
total of 6 bytes. But in practice, unicode itself only needs 4 bytes
and that is as far as any database supports right now since they target
unicode (see https://en.wikipedia.org/wiki/UTF-8#Description). That's
all any database we're talking about supports at most. So...lets assume
this means four bytes.

The 5 byte statement came in via a bug to Nova, it might have been
confused, and I might have been confused in interpretting it. Lets
assume it's invalid now and move to 4 byte.

From the perspective of database-agnosticism with regards to database
and driver support for non-ascii characters, this problem has been
solved by SQLAlchemy well before Python 3 existed when many DBAPIs would
literally crash if they received a u'' string, and the rest of them
would churn out garbage; SQLAlchemy implemented a full encode/decode
layer on top of the Python DBAPI to fix this. The situation is vastly
improved now that all DBAPIs support unicode natively.

However, on the MySQL side there is this complexity that their utf-8
support is a 3-byte only storage model, and you have to use utf8mb4 if
you want the four byte model. I'm not sure right now what projects are
specifically hitting issues related to this.

Postgresql doesn't have such a limitation. If your Postgresql server
or specific database is set up for utf-8 (which should be the case),
then you get full utf-8 character set support.

So I don't see the problem of "consistent utf8 support" having much to
do with whether or not we support Posgtresql - you of course need your
"CREATE DATABASE" to include the utf8 charset like we do on MySQL, but
that's it.

That's where we stand which means that we're doing 3 byte UTF8 on MySQL,
and 4 byte on PG. That's actually an API facing difference today. It's
work to dig out of from the MySQL side, maybe the PG one is just all
super cool and done. But it's still a consideration point.

  • The requirement that Postgresql libraries are compiled for new users
    trying to just run unit tests (no equiv is true for mysql because of
    the pure python driver).

I would suggest that new developers for whom the presence of things like
postgresql client libraries is a challenge (but somehow they are running
a MySQL server for their pure python driver to talk to?) don't actually
have to worry about running the tests against Postgresql, this is how
the "opportunistic" testing model in oslo.db has always worked; it only
runs for the backends that you have set up.

Also, openstack got all the way through Kilo approximately using the
native python-MySQL driver which required a compiled client library as
well as the MySQL dependencies be installed. The psycopg2 driver has a
ton of whl's up on pypi (https://pypi.python.org/pypi/psycopg2) and all
linux distros supply it as a package in any case, so an actual "compile"
should not be needed. Also, this is Openstack....it's basic existence
is a kind of this vastly enormous glue between thousands of native
(ultimately C-compiled) libraries and packages, and it runs only
on...linux. So this is just a weird point to bring up. Seems a
little red herring to me.

They aren't running a database. This is one of those areas where pypi
wheel building got a lot better since the last time I looked, and one of
the reasons we had this whole bindep system in OpenStack. Because of how
test-requirements are installed, the drivers are installed in tox
whether or not they are used, because there is no good way to late
install them in runs. Whether or not they are used is based on whether a
db is setup.

But, cool, pypi wheels are good enough that we can delete the need for
all these headers for end users, very cool.

  • Consistency around case sensitivity collation defaults that lead to
    strange bugs around searching/updating names in resources.

Finally, a real issue-ish thing that requires a resolution. So the
good news here is that while MySQL is defaulting to case-insensitive
collations (which I assume we like) but Postgresql has almost no support
for case-insensitive collations (unless you use this odd CITEXT
datatype), it is possible to make a case-sensitive collation style
become case-insensitive at SQL query time much more easily than it would
be to go the other way.

SQLAlchemy already has some case-insensitive operators, most notably
"ilike()", which is a case-insensitive "LIKE" that is backend agnostic.
If these search queries are just using LIKE then they only need use
ilike() from SQLAlchemy instead of like().

If we are talking about the full range of operators like ==, !=,
etc., and/or if we are also concerned that developers may use like()
when they really need to use ilike(), the issue can be addressed at the
typing level as well. Using SQLAlchemy, a String datatype that
guarantees case-insensitive comparisons is straightforward to construct.
This would be a Python side replacement for the String type, and
possibly Text, Unicode, etc. as needed. It does not imply any
database schema migrations. The hypothetical CaseInsensitiveString
would override all the comparison operators to ensure that on the
Postgresql (or other case-sensitive) backend, both sides of the
expression are embedded within the SQL LOWER() function, so that these
comparisons act case insensitively. The trick then is to get the
downstream projects to use this type (which would be in oslo.db) in
their models, which is the usual herding cats job. But this is a pretty
solvable problem.

Sure, it's work. But that's fine. The point of that list was that there
is stuff that is work because SQLA is a leaky abstraction. Which is fine
if there are people taking that work off the table.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 18, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

On 05/18/2017 02:49 PM, Sean Dague wrote:
On 05/18/2017 01:02 PM, Mike Bayer wrote:

On 05/17/2017 02:38 PM, Sean Dague wrote:

Some of the concerns/feedback has been "please describe things that are
harder by this being an abstraction", so examples are provided.

so let's go through this list:

  • OpenStack services taking a more active role in managing the DBMS

mmmm, "managing" is vague to me, are we referring to the database
service itself, e.g. starting / stopping / configuring? installers
like tripleo do this now, pacemaker is standard in HA for control of
services, I think I need some background here as to what the more active
role would look like.

I will leave that one for mordred, it was his concern.

I have written a novel on this topic just now in a thread titled

"[tc] Active or passive role with our database layer"

  • The ability to have zero down time upgrade for services such as
    Keystone.

So "zero down time upgrades" seems to have broken into:

  • "expand / contract with the code carefully dancing around the
    existence of two schema concepts simultaneously", e.g. nova, neutron.
    AFAIK there is no particular issue supporting multiple backends on this
    because we use alembic or sqlalchemy-migrate to abstract away basic
    ALTER TABLE types of feature.

Agree. But there are still issues with designing the schema upgrades
themselves to be compatible with replication streams or other online
schema update constraints.

  • "expand / contract using server side triggers to reconcile the two
    schema concepts", e.g. keystone. This is more difficult because there
    is currently no "trigger" abstraction layer. Triggers represent more
    of an imperative programming model vs. typical SQL, which is why I've
    not taken on trying to build a one-size-fits-all abstraction for this in
    upstream Alembic or SQLAlchemy. However, it is feasible to build a
    "one-size-that-fits-openstack-online-upgrades" abstraction. I was
    trying to gauge interest in helping to create this back in the
    "triggers" thread, in my note at
    http://lists.openstack.org/pipermail/openstack-dev/2016-August/102345.html,
    which also referred to some very raw initial code examples. However, it
    received strong pushback from a wide range of openstack veterans, which
    led me to believe this was not a thing that was happening. Apparently
    Keystone has gone ahead and used triggers anyway, however I was not
    pulled into that process. But if triggers are to be "blessed" by at
    least some projects, I can likely work on this problem for MySQL /
    Postgresql agnosticism. If keystone is using triggers right now for
    online upgrades, I would ask, are they currently working on Postgresql
    as well with PG-specific triggers, or does Postgresql degrade into a
    "non-online" migration scenario if you're running Keystone?

This is the triggers conversation, which while I have issues with, is
the only path forward now if you are doing keystone in a load balancer
and need to retain HA through the process.

I also have issues with this- and I continue to reject categorically the
assertion that it's the only path forward.

It's not a normal or suggested way to deal with this. There ARE
best-practice suggested ways to deal with this ... but to the point of
the other email, they require being more intimate with the HA architecture.

No one is looking at pg here. And yes, everything not mysql would just
have to take the minimal expand / contract downtime. Data services like
Keystone / Glance whose data is their REST API definitely have different
concerns than Nova dropping it's control plane for 30s to recycle code
and apply db schema tweaks.

Depending on the app, nova's control plane is just as much of a concern.
I agree- there are certainly plenty of workloads out there where it's
not - but there is an issue at hand that needs to be solved and needs to
be solved one time and then always work.

  • Consistent UTF8 4 & 5 byte support in our APIs

"5 byte support" appears to refer to utf-8's ability to be...well a
total of 6 bytes. But in practice, unicode itself only needs 4 bytes
and that is as far as any database supports right now since they target
unicode (see https://en.wikipedia.org/wiki/UTF-8#Description). That's
all any database we're talking about supports at most. So...lets assume
this means four bytes.

The 5 byte statement came in via a bug to Nova, it might have been
confused, and I might have been confused in interpretting it. Lets
assume it's invalid now and move to 4 byte.

Yes.

From the perspective of database-agnosticism with regards to database
and driver support for non-ascii characters, this problem has been
solved by SQLAlchemy well before Python 3 existed when many DBAPIs would
literally crash if they received a u'' string, and the rest of them
would churn out garbage; SQLAlchemy implemented a full encode/decode
layer on top of the Python DBAPI to fix this. The situation is vastly
improved now that all DBAPIs support unicode natively.

However, on the MySQL side there is this complexity that their utf-8
support is a 3-byte only storage model, and you have to use utf8mb4 if
you want the four byte model. I'm not sure right now what projects are
specifically hitting issues related to this.

Postgresql doesn't have such a limitation. If your Postgresql server
or specific database is set up for utf-8 (which should be the case),
then you get full utf-8 character set support.

So I don't see the problem of "consistent utf8 support" having much to
do with whether or not we support Posgtresql - you of course need your
"CREATE DATABASE" to include the utf8 charset like we do on MySQL, but
that's it.

That's where we stand which means that we're doing 3 byte UTF8 on MySQL,
and 4 byte on PG. That's actually an API facing difference today. It's
work to dig out of from the MySQL side, maybe the PG one is just all
super cool and done. But it's still a consideration point.

The biggest concern for me is that we're letting API behavior be
dictated by database backend and/or database config choices. The API
should behave like the API behaves.

  • The requirement that Postgresql libraries are compiled for new users
    trying to just run unit tests (no equiv is true for mysql because of
    the pure python driver).

I would suggest that new developers for whom the presence of things like
postgresql client libraries is a challenge (but somehow they are running
a MySQL server for their pure python driver to talk to?) don't actually
have to worry about running the tests against Postgresql, this is how
the "opportunistic" testing model in oslo.db has always worked; it only
runs for the backends that you have set up.

Also, openstack got all the way through Kilo approximately using the
native python-MySQL driver which required a compiled client library as
well as the MySQL dependencies be installed. The psycopg2 driver has a
ton of whl's up on pypi (https://pypi.python.org/pypi/psycopg2) and all
linux distros supply it as a package in any case, so an actual "compile"
should not be needed. Also, this is Openstack....it's basic existence
is a kind of this vastly enormous glue between thousands of native
(ultimately C-compiled) libraries and packages, and it runs only
on...linux. So this is just a weird point to bring up. Seems a
little red herring to me.

They aren't running a database. This is one of those areas where pypi
wheel building got a lot better since the last time I looked, and one of
the reasons we had this whole bindep system in OpenStack. Because of how
test-requirements are installed, the drivers are installed in tox
whether or not they are used, because there is no good way to late
install them in runs. Whether or not they are used is based on whether a
db is setup.

But, cool, pypi wheels are good enough that we can delete the need for
all these headers for end users, very cool.

  • Consistency around case sensitivity collation defaults that lead to
    strange bugs around searching/updating names in resources.

Finally, a real issue-ish thing that requires a resolution. So the
good news here is that while MySQL is defaulting to case-insensitive
collations (which I assume we like) but Postgresql has almost no support
for case-insensitive collations (unless you use this odd CITEXT
datatype), it is possible to make a case-sensitive collation style
become case-insensitive at SQL query time much more easily than it would
be to go the other way.

SQLAlchemy already has some case-insensitive operators, most notably
"ilike()", which is a case-insensitive "LIKE" that is backend agnostic.
If these search queries are just using LIKE then they only need use
ilike() from SQLAlchemy instead of like().

If we are talking about the full range of operators like ==, !=,
etc., and/or if we are also concerned that developers may use like()
when they really need to use ilike(), the issue can be addressed at the
typing level as well. Using SQLAlchemy, a String datatype that
guarantees case-insensitive comparisons is straightforward to construct.
This would be a Python side replacement for the String type, and
possibly Text, Unicode, etc. as needed. It does not imply any
database schema migrations. The hypothetical CaseInsensitiveString
would override all the comparison operators to ensure that on the
Postgresql (or other case-sensitive) backend, both sides of the
expression are embedded within the SQL LOWER() function, so that these
comparisons act case insensitively. The trick then is to get the
downstream projects to use this type (which would be in oslo.db) in
their models, which is the usual herding cats job. But this is a pretty
solvable problem.

Sure, it's work. But that's fine. The point of that list was that there
is stuff that is work because SQLA is a leaky abstraction. Which is fine
if there are people taking that work off the table.

I would not characterize this as SQLA being a leaky abstraction.

I'd say that at some point we didn't make a decision as to what we
wanted to do with text input and how it would be stored or not stored
and how it would be searched and sorted. Case sensitive collations have
been available to us the entire time, but we never decided whether our
API was case sensitive or case insensitive. OR - we DID decide that
our API is case insensitive the fact that it isn't on some deployments
is a bug. I'm putting money on the 'nobody made a decision' answer.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 21, 2017 by Monty_Taylor (22,780 points)   2 4 7
0 votes

On 05/21/2017 03:51 PM, Monty Taylor wrote:

So I don't see the problem of "consistent utf8 support" having much to
do with whether or not we support Posgtresql - you of course need your
"CREATE DATABASE" to include the utf8 charset like we do on MySQL, but
that's it.

That's where we stand which means that we're doing 3 byte UTF8 on MySQL,
and 4 byte on PG. That's actually an API facing difference today. It's
work to dig out of from the MySQL side, maybe the PG one is just all
super cool and done. But it's still a consideration point.

The biggest concern for me is that we're letting API behavior be
dictated by database backend and/or database config choices. The API
should behave like the API behaves.

The API should behave like, "we store utf-8". We should accept that
"utf-8" means "up to four bytes" and make sure we are using utf8mb4 for
all MySQL backends. That the API of MySQL has made this bizarre
decision about what utf-8 is to be would be a bug in MySQL that needs to
be worked around by the calling application. Other databases that want
to work with openstack need to also do utf-8 with four bytes. We can
easily add some tests to oslo.db that round trip an assortment of
unicode glyphs to confirm this (if there's one kind of test I've written
more than anyone should, it's pushing out non-ascii bytes to a database
and testing they come back the same).

Sure, it's work. But that's fine. The point of that list was that there
is stuff that is work because SQLA is a leaky abstraction. Which is fine
if there are people taking that work off the table.

I would not characterize this as SQLA being a leaky abstraction.

yeessss ! win! :)

I'd say that at some point we didn't make a decision as to what we
wanted to do with text input and how it would be stored or not stored
and how it would be searched and sorted. Case sensitive collations have
been available to us the entire time, but we never decided whether our
API was case sensitive or case insensitive. OR - we DID decide that
our API is case insensitive the fact that it isn't on some deployments
is a bug. I'm putting money on the 'nobody made a decision' answer.

I wasn't there but perhaps early Openstack versions didn't have "textual
search" kinds of features ? maybe they were added by folks who didn't
consider the case sensitivity issue at that time. I'd be strongly in
favor of making use of oslo.db / SQLAlchemy constructs that are
explicitly case sensitive or not. It's true, SQLAlchemy also does not
force you to "make a decision" on this, if it did, this would be in the
"hooray the abstraction did not leak!" category. But SQLA makes lots
of these kinds of decisions to be kind of hands-off about things like
this as developers often don't want there to be a decision made here
(lest it adds even more to the "SQLAlchemy forces me to make so many
decisions!" complaint I have to read on twitter every day).


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 22, 2017 by Mike_Bayer (15,260 points)   1 5 6
0 votes

On 05/15/2017 07:16 AM, Sean Dague wrote:
We had a forum session in Boston on Postgresql and out of that agreed to
the following steps forward:

  1. explicitly warn in operator facing documentation that Postgresql is
    less supported than MySQL. This was deemed better than just removing
    documentation, because when people see Postgresql files in tree they'll
    make assumptions (at least one set of operators did).

  2. Suse is in process of investigating migration from PG to Gallera for
    future versions of their OpenStack product. They'll make their findings
    and tooling open to help determine how burdensome this kind of
    transition would be for folks.

After those findings, we can come back with any next steps (or just
leave it as good enough there).

The TC governance patch is updated here -
https://review.openstack.org/#/c/427880/ - or if there are other
discussion questions feel free to respond to this thread.

I've ended up in a number of conversations publicly and privately over
the last week. I'm trying to figure out how we best capture and
acknowledge the concerns.

My top concerns remain:

A1) Do not surprise users late by them only finding out they are on the
less traveled once they are so deeply committed there is no turning
back. It's fine for users to choose that path as long as they are
informed that they are going to need to be more self reliant.

A2) Do not prevent features like zero downtime keystone making forward
progress with a MySQL only solution. There will always be a way to
handle these things with a change window, but the non change window
version really does need more understanding of what the db is doing.

There are some orthogonal concerns

B1) PG was chosen by people in the past, maybe more than we realized,
that's real users that we don't want to throw under a bus. Whole sale
delete is off the table. Even what deprecation might mean is hard to
figure out given that there is "no clear path off", "missing data of
who's on it", and potentially creative solutions using it that people
would like (the Cockroach db question, though given some of the Galera
fixes that have had to go in, these things are never drop in replacements).

B2) The upstream code isn't so irreparably changed (e.g. delete the SQLA
layer) that it's not possible to have alternative DB backends
(especially as people might want to experiement with different
approaches in the future).

I think these are actually compatible concerns. The current proposal to
me actually tries to address A1 & B1, with a hint about why A2 is
valuable and we would want to do that.

It feels like there would be a valuable follow on in which A2 & B2 were
addressed which is basically "progressive enhancements can be allowed to
only work with MySQL based backends". Which is the bit that Monty has
been pushing for in other threads.

This feels like what a Tier 2 support looks like. A basic SQLA and pray
so that if you live behind SQLA you are probably fine (though not
tested), and then test and advanced feature roll out on a single
platform. Any of that work might port to other platforms over time, but
we don't want to make that table stakes for enhancements.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 22, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

On 5/22/2017 10:58 AM, Sean Dague wrote:
I think these are actually compatible concerns. The current proposal to
me actually tries to address A1 & B1, with a hint about why A2 is
valuable and we would want to do that.

It feels like there would be a valuable follow on in which A2 & B2 were
addressed which is basically "progressive enhancements can be allowed to
only work with MySQL based backends". Which is the bit that Monty has
been pushing for in other threads.

This feels like what a Tier 2 support looks like. A basic SQLA and pray
so that if you live behind SQLA you are probably fine (though not
tested), and then test and advanced feature roll out on a single
platform. Any of that work might port to other platforms over time, but
we don't want to make that table stakes for enhancements.

I think this is reasonable and is what I've been hoping for as a result
of the feedback on this.

I think it's totally fine to say tier 1 backends get shiny new features.
I mean, hell, compare the libvirt driver in nova to all other virt
drivers in nova. New features are written for the libvirt driver and we
have to strong-arm them into other drivers for a compatibility story.

I think we should turn on postgresql as a backend in one of the CI jobs,
as I've noted in the governance change - it could be the nova-next
non-voting job which only runs on nova, but we should have something
testing this as long as it's around, especially given how easy it is to
turn this on in upstream CI (it's flipping a devstack variable).

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 23, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 5
0 votes

On Mon, 22 May 2017, Sean Dague wrote:

This feels like what a Tier 2 support looks like. A basic SQLA and pray
so that if you live behind SQLA you are probably fine (though not
tested), and then test and advanced feature roll out on a single
platform. Any of that work might port to other platforms over time, but
we don't want to make that table stakes for enhancements.

I've often wondered why what's being called "Tier 1" (advancec
features) here isn't something done downstream of "generic"
OpenStack.

Which is not to say it would have to be closed source or vendor
oriented. Simply not here. It may be we've got enough to deal with
here.

The 'external' model described by Monty makes things that are not
here easier to manage (but, to be fair, not necessarily easier to
make).

--
Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/
freenode: cdent tw: @anticdent__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded May 23, 2017 by cdent_plus_os_at_ant (12,800 points)   2 2 4
0 votes

On 05/22/2017 11:26 PM, Matt Riedemann wrote:
On 5/22/2017 10:58 AM, Sean Dague wrote:

I think these are actually compatible concerns. The current proposal to
me actually tries to address A1 & B1, with a hint about why A2 is
valuable and we would want to do that.

It feels like there would be a valuable follow on in which A2 & B2 were
addressed which is basically "progressive enhancements can be allowed to
only work with MySQL based backends". Which is the bit that Monty has
been pushing for in other threads.

This feels like what a Tier 2 support looks like. A basic SQLA and pray
so that if you live behind SQLA you are probably fine (though not
tested), and then test and advanced feature roll out on a single
platform. Any of that work might port to other platforms over time, but
we don't want to make that table stakes for enhancements.

I think this is reasonable and is what I've been hoping for as a result
of the feedback on this.

I think it's totally fine to say tier 1 backends get shiny new features.
I mean, hell, compare the libvirt driver in nova to all other virt
drivers in nova. New features are written for the libvirt driver and we
have to strong-arm them into other drivers for a compatibility story.

I think we should turn on postgresql as a backend in one of the CI jobs,
as I've noted in the governance change - it could be the nova-next
non-voting job which only runs on nova, but we should have something
testing this as long as it's around, especially given how easy it is to
turn this on in upstream CI (it's flipping a devstack variable).

Postgresql support shouldn't be in devstack. If we're taking a tier 2
approach, someone needs to carve out database plugins from devstack and
pg would be one (as could be galera, etc).

This historical artifact that pg was maintained in devstack, but much
more widely used backends were not, is part of the issue.

It would also be a good unit test case as to whether there are pg
focused folks around out there willing to do this basic devstack plugin
/ job setup work.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 23, 2017 by Sean_Dague (66,200 points)   4 8 14
0 votes

As OpenStack has evolved and grown, we are ending up with more and more
MySQL-isms in the code. I'd love to see OpenStack support every database
out there, but that is becoming more and more difficult. I've tried to
get OpenStack to work with other databases like Oracle DB, MongoDB,
TimesTen, NoSQL, and I can tell you that first hand it's not doable
without making some significant changes. Some services would be easy to
make more database agnostic, but most would require a lot of reworking.
I think the pragmatic thing is to do is focus on supporting the MySQL
dialect with the different engines and clustering technologies that have
emerged. oslodb is a great abstraction layer. We should continue to
build upon that and make sure that every OpenStack service uses it
end-to-end. I've already seen plenty of cases where services like
Barbican and Murano are not using it. I've also seen plenty of use cases
where core services are using the older methods of connecting to the
database and re-inventing the wheel to deal with things like retries.
The more we use oslo
db and make sure that people are consistent with
it's use and best practices, we better off we'll be in the long-run.

On the topic of doing live upgrades. I think it's a "nice to have"
feature, but again we need a consistent framework that all services will
follow. It's already complicated enough with how different services deal
with parallelism and locking. So if we are going to go down this path
across even the core services, we need to have a solid solution and
framework. Otherwise, we'll end up with a hodgepodge of maturity levels
between services. The expectation from operators is that if you say you
can do live upgrades, they will expect that to be the case across all of
OpenStack and not a buffet style feature. We would also have to take
into consideration larger shops that have more distributed and
scaled-out control planes. So we need be careful on this as it will have
a wide impact on development, testing, and operating.

Octave

On 5/23/2017 6:00 AM, Sean Dague wrote:
On 05/22/2017 11:26 PM, Matt Riedemann wrote:

On 5/22/2017 10:58 AM, Sean Dague wrote:

I think these are actually compatible concerns. The current proposal to
me actually tries to address A1 & B1, with a hint about why A2 is
valuable and we would want to do that.

It feels like there would be a valuable follow on in which A2 & B2 were
addressed which is basically "progressive enhancements can be allowed to
only work with MySQL based backends". Which is the bit that Monty has
been pushing for in other threads.

This feels like what a Tier 2 support looks like. A basic SQLA and pray
so that if you live behind SQLA you are probably fine (though not
tested), and then test and advanced feature roll out on a single
platform. Any of that work might port to other platforms over time, but
we don't want to make that table stakes for enhancements.
I think this is reasonable and is what I've been hoping for as a result
of the feedback on this.

I think it's totally fine to say tier 1 backends get shiny new features.
I mean, hell, compare the libvirt driver in nova to all other virt
drivers in nova. New features are written for the libvirt driver and we
have to strong-arm them into other drivers for a compatibility story.

I think we should turn on postgresql as a backend in one of the CI jobs,
as I've noted in the governance change - it could be the nova-next
non-voting job which only runs on nova, but we should have something
testing this as long as it's around, especially given how easy it is to
turn this on in upstream CI (it's flipping a devstack variable).
Postgresql support shouldn't be in devstack. If we're taking a tier 2
approach, someone needs to carve out database plugins from devstack and
pg would be one (as could be galera, etc).

This historical artifact that pg was maintained in devstack, but much
more widely used backends were not, is part of the issue.

It would also be a good unit test case as to whether there are pg
focused folks around out there willing to do this basic devstack plugin
/ job setup work.

-Sean


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 23, 2017 by Octave_J._Orgeron (1,520 points)   1
...