settingsLogin | Registersettings

[openstack-dev] [oslo][oslo.db] MySQL Cluster support

0 votes

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack services. This will enable the community to benefit from an active/active, auto-sharding, and scale-out MySQL database. My approach is to have a single configuration setting in each core OpenStack service in the oslo.db configuration section called mysqlstorageengine that will enable the logic in the SQL Alchemy or Alembic upgrade scripts to handle the differences between InnoDB and NDB storage engines respectively. When enabled, this logic will make the required table schema changes around:

Row character length limits 65k -> 14k

Proper SQL ordering of foreign key, constraints, and index operations

Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no impact on the default InnoDB functionality. These changes have been tested on Kilo and Mitaka in previous releases of our OpenStack distributions with Tempest. I'm working on updating these patches for upstream consumption. We are also working on a 3rd party CI for regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Thanks,
Octave

--


Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack


Oracle is committed to developing practices and products that help protect the environment


asked Feb 6, 2017 in openstack-dev by Octave_J._Orgeron (1,520 points)   1

31 Responses

0 votes

Excerpts from Octave J. Orgeron's message of 2017-02-01 20:33:38 -0700:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  • Row character length limits 65k -> 14k
  • Proper SQL ordering of foreign key, constraints, and index operations
  • Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Thanks,
Octave

Is it possible to detect the storage engine at runtime, instead of
having the operator configure it?

Doug


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 2, 2017 by Doug_Hellmann (87,520 points)   3 4 8
0 votes

On 02/01/2017 09:33 PM, Octave J. Orgeron wrote:
Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  • Row character length limits 65k -> 14k
  • Proper SQL ordering of foreign key, constraints, and index operations
  • Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Yay!

(You may not be aware, but there are several of us who used to be on the
MySQL Cluster team who are now on OpenStack. I've been wanting good NDB
support for a while. So thank you!)


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 2, 2017 by Monty_Taylor (22,780 points)   2 4 7
0 votes

On 02/02/2017 10:25 AM, Monty Taylor wrote:
On 02/01/2017 09:33 PM, Octave J. Orgeron wrote:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  • Row character length limits 65k -> 14k
  • Proper SQL ordering of foreign key, constraints, and index operations
  • Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Yay!

(You may not be aware, but there are several of us who used to be on the
MySQL Cluster team who are now on OpenStack. I've been wanting good NDB
support for a while. So thank you!)

as I noted on the review it would be nice to have some specifics of how
this is to be accomplished as the code review posted doesn't show
anything of how this would work.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 2, 2017 by Mike_Bayer (15,260 points)   1 5 6
0 votes

Hi Doug,

One could try to detect the default engine. However, in MySQL Cluster, you can support multiple storage engines. Only NDB is fully clustered and replicated, so if you accidentally set a table to be InnoDB it won't be replicated . So it makes more sense for the operator to be explicit on which engine they want to use.

Thanks,
Octave

On 2/2/2017 6:46 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-01 20:33:38 -0700:

Hi Folks, I'm working on adding support for MySQL Cluster to the core OpenStack services. This will enable the community to benefit from an active/active, auto-sharding, and scale-out MySQL database. My approach is to have a single configuration setting in each core OpenStack service in the oslo.db configuration section called mysqlstorageengine that will enable the logic in the SQL Alchemy or Alembic upgrade scripts to handle the differences between InnoDB and NDB storage engines respectively. When enabled, this logic will make the required table schema changes around: * Row character length limits 65k -> 14k * Proper SQL ordering of foreign key, constraints, and index operations * Interception of savepoint and nested operations By default this functionality will not be enabled and will have no impact on the default InnoDB functionality. These changes have been tested on Kilo and Mitaka in previous releases of our OpenStack distributions with Tempest. I'm working on updating these patches for upstream consumption. We are also working on a 3rd party CI for regression testing against MySQL Cluster for the community. The first change set is for oslo.db and can be reviewed at: https://review.openstack.org/427970 Thanks, Octave

Is it possible to detect the storage engine at runtime, instead of having the operator configure it? Doug __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--


Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550
500 Eldorado Blvd. | Broomfield, CO 80021

Oracle is committed to developing practices and products that help protect the environment


responded Feb 2, 2017 by Octave_J._Orgeron (1,520 points)   1
0 votes

Hi Monty,

Thank you for the feedback. I'm excited about getting these patches upstream as everyone will be able to benefit from them.

Thanks,
Octave

On 2/2/2017 8:25 AM, Monty Taylor wrote:

On 02/01/2017 09:33 PM, Octave J. Orgeron wrote:

Hi Folks, I'm working on adding support for MySQL Cluster to the core OpenStack services. This will enable the community to benefit from an active/active, auto-sharding, and scale-out MySQL database. My approach is to have a single configuration setting in each core OpenStack service in the oslo.db configuration section called mysqlstorageengine that will enable the logic in the SQL Alchemy or Alembic upgrade scripts to handle the differences between InnoDB and NDB storage engines respectively. When enabled, this logic will make the required table schema changes around: * Row character length limits 65k -> 14k * Proper SQL ordering of foreign key, constraints, and index operations * Interception of savepoint and nested operations By default this functionality will not be enabled and will have no impact on the default InnoDB functionality. These changes have been tested on Kilo and Mitaka in previous releases of our OpenStack distributions with Tempest. I'm working on updating these patches for upstream consumption. We are also working on a 3rd party CI for regression testing against MySQL Cluster for the community. The first change set is for oslo.db and can be reviewed at: https://review.openstack.org/427970

Yay! (You may not be aware, but there are several of us who used to be on the MySQL Cluster team who are now on OpenStack. I've been wanting good NDB support for a while. So thank you!) __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--


Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550
500 Eldorado Blvd. | Broomfield, CO 80021

Oracle is committed to developing practices and products that help protect the environment


responded Feb 2, 2017 by Octave_J._Orgeron (1,520 points)   1
0 votes

Hi Mike,

I've sent out another email that gives some more insight into how this will work for the other OpenStack services. The hook in the oslo.db namespace gives a global configuration point for enabling the patches elsewhere.

Thanks,
Octave

On 2/2/2017 9:24 AM, Mike Bayer wrote:

On 02/02/2017 10:25 AM, Monty Taylor wrote:

On 02/01/2017 09:33 PM, Octave J. Orgeron wrote:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  * Row character length limits 65k -> 14k
  * Proper SQL ordering of foreign key, constraints, and index operations
  * Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Yay!

(You may not be aware, but there are several of us who used to be on the
MySQL Cluster team who are now on OpenStack. I've been wanting good NDB
support for a while. So thank you!)

as I noted on the review it would be nice to have some specifics of how this is to be accomplished as the code review posted doesn't show anything of how this would work.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--


Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550
500 Eldorado Blvd. | Broomfield, CO 80021

Oracle is committed to developing practices and products that help protect the environment


responded Feb 2, 2017 by Octave_J._Orgeron (1,520 points)   1
0 votes

Excerpts from Octave J. Orgeron's message of 2017-02-02 09:40:23 -0700:

Hi Doug,

One could try to detect the default engine. However, in MySQL Cluster,
you can support multiple storage engines. Only NDB is fully clustered
and replicated, so if you accidentally set a table to be InnoDB it won't
be replicated . So it makes more sense for the operator to be explicit
on which engine they want to use.

I think this change is probably a bigger scale item than I understood
it to be when you originally contacted me off-list for advice about
how to get started. I hope I haven't steered you too far wrong, but
at least the conversation is started.

As someone (Mike?) pointed out on the review, the option by itself
doesn't do much of anything, now. Before we add it, I think we'll
want to see some more detail about how it's going used. It may be
easier to have that broader conversation here on email than on the
patch currently up for review.

It sounds like part of the plan is to use the configuration setting
to control how the migration scripts create tables. How will that
work? Does each migration need custom logic, or can we build helpers
into oslo.db somehow? Or will the option be passed to the database
to change its behavior transparently?

Keep in mind that we do not encourage code outside of libraries to
rely on configuration settings defined within libraries, because
that limits our ability to change the names and locations of the
configuration variables. If migration scripts need to access the
configuration setting we will need to add some sort of public API
to oslo.db to query the value. The function can simply return the
configured value.

What other behaviors are likely to be changed by the new option?
Will application runtime behavior need to know about the storage
engine?

Doug

Thanks,
Octave

On 2/2/2017 6:46 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-01 20:33:38 -0700:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  • Row character length limits 65k -> 14k
  • Proper SQL ordering of foreign key, constraints, and index operations
  • Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Thanks,
Octave

Is it possible to detect the storage engine at runtime, instead of
having the operator configure it?

Doug


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 2, 2017 by Doug_Hellmann (87,520 points)   3 4 8
0 votes

Hi Doug,

Comments below..

Thanks,
Octave

On 2/2/2017 11:27 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-02 09:40:23 -0700:

Hi Doug, One could try to detect the default engine. However, in MySQL Cluster, you can support multiple storage engines. Only NDB is fully clustered and replicated, so if you accidentally set a table to be InnoDB it won't be replicated . So it makes more sense for the operator to be explicit on which engine they want to use.

I think this change is probably a bigger scale item than I understood it to be when you originally contacted me off-list for advice about how to get started. I hope I haven't steered you too far wrong, but at least the conversation is started. As someone (Mike?) pointed out on the review, the option by itself doesn't do much of anything, now. Before we add it, I think we'll want to see some more detail about how it's going used. It may be easier to have that broader conversation here on email than on the patch currently up for review.
Understood, it's a complicated topic since it involves gritty details in SQL Alchemy and Alembic that are masked from end-users and operators alike. Figuring out how to make this work did take some time on my part.

It sounds like part of the plan is to use the configuration setting to control how the migration scripts create tables. How will that work? Does each migration need custom logic, or can we build helpers into oslo.db somehow? Or will the option be passed to the database to change its behavior transparently?
These are good questions. For each service, when the db sync or db manage operation is done it will call into SQL Alchemy or Alembic depending on the methods used by the given service. For example, most use SQL Alchemy, but there are services like Ironic and Neutron that use Alembic. It is within these scripts under the /db/* hierarchy that the logic exist today to configure the database schema for any given service. Both approaches will look at the schema version in the database to determine where to start the create, upgrade, heal, etc. operations. What my patches do is that in the scripts where a table needs to be modified, there will be custom IF/THEN logic to check the cfg.CONF.database.mysqlstorageengine setting to make the required modifications. There are also use cases where the api.py or model(s).py under the /db/ hierarchy needs to look at this setting as well for API and CLI operations where mysqlengine is auto-inserted into DB operations. In those use cases, I replace the hard coded "InnoDB" with the mysqlstorage_engine variable.

It would be interesting if we could develop some helpers to automate this, but it would probably have to be at the SQL Alchemy or Alembic levels. Unfortunately, throughout all of the OpenStack services today we are hard coding things like mysql_engine, using InnoDB specific features (savepoints, nested operations, etc.), and not following the strict SQL orders for modifying table elements (foreign keys, constraints, and indexes). That actually makes it difficult to support other MySQL dialects or other databases out of the box. SQL Alchemy can be used to fix some of these things if the SQL statements are all generic and we follow strict SQL rules. But to change that would be a monumental effort. That is why I took this approach of just adding custom logic. There is a president for this already for Postgres and DB2 support in some of the OpenStack services using custom logic to deal with similar differences.

As to why we should place the configuration setting into oslo.db? Here are a couple of logical reasons:

The configuration block for database settings for each service comes from the oslo.db namespace today under cfg.CONF.database.*. Placing it here makes the location consistent across all of the services.

Within the SQL Alchemy and Alembic scripts, this is one of the few common namespaces that are available without bringing in a larger number of modules across the services today.

Many of the SQL Alchemy and Alembic scripts only import the minimal set of python modules. If we imported others, we would also have to initialize those name spaces which means a lot more code :(

Reduces the amount of overhead required to make these changes.

Keep in mind that we do not encourage code outside of libraries to rely on configuration settings defined within libraries, because that limits our ability to change the names and locations of the configuration variables. If migration scripts need to access the configuration setting we will need to add some sort of public API to oslo.db to query the value. The function can simply return the configured value.
Configuration parameters within any given service will make use of a large namespace that pulls in things from oslo and the .conf files for a given service. So even when an API, CLI, or DB related call is made, these namespaces are key for things to work. In the case of the SQL Alchemy and Alembic scripts, they also make use of this namespace with oslo, oslo.db, etc. to figure out how to connect to the database and other database settings. I don't think we need a public API for these kinds of calls as the community already makes use of the libraries to build the namespace. My oslo.db setting and patches for each service just make use of the cfg.CONF.database namespace to determine the correct behavior to execute.

What other behaviors are likely to be changed by the new option? Will application runtime behavior need to know about the storage engine?
The changes will be transparent to the application runtime behavior. The APIs and CLI tools call into the /db/api.py as the entry point for database calls. Behind this you usually have a models.py that is aware of the database schema to understand the layout of things. So the underlining structure is abstracted away from the run-time. These entry points sometimes do require minor modifications to handle any hard coded issues or intercept functions like savepoints and nested operations. Again I use the cfg.CONF.database namespace to check for the appropriate behavior and implement IF/THEN logic to do the right thing.

Some of my design objectives for all of these patches are:

Zero impact on OpenStack functionality and usability (API, CLI, user experience)

No loss in database structure. Consistent foreign keys, constraints, indexes, etc.

Minimal impact on column size and/or types to fit within NDB table row limits. Many columns are over-sized today.

Validate functionality of APIs, service processes, and CLI. Tempest is our friend :)

Zero impact for users not using MySQL Cluster (NDB).

Doug

Thanks, Octave On 2/2/2017 6:46 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-01 20:33:38 -0700:

Hi Folks, I'm working on adding support for MySQL Cluster to the core OpenStack services. This will enable the community to benefit from an active/active, auto-sharding, and scale-out MySQL database. My approach is to have a single configuration setting in each core OpenStack service in the oslo.db configuration section called mysqlstorageengine that will enable the logic in the SQL Alchemy or Alembic upgrade scripts to handle the differences between InnoDB and NDB storage engines respectively. When enabled, this logic will make the required table schema changes around: * Row character length limits 65k -> 14k * Proper SQL ordering of foreign key, constraints, and index operations * Interception of savepoint and nested operations By default this functionality will not be enabled and will have no impact on the default InnoDB functionality. These changes have been tested on Kilo and Mitaka in previous releases of our OpenStack distributions with Tempest. I'm working on updating these patches for upstream consumption. We are also working on a 3rd party CI for regression testing against MySQL Cluster for the community. The first change set is for oslo.db and can be reviewed at: https://review.openstack.org/427970 Thanks, Octave

Is it possible to detect the storage engine at runtime, instead of having the operator configure it? Doug __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--


Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550
500 Eldorado Blvd. | Broomfield, CO 80021

Oracle is committed to developing practices and products that help protect the environment


responded Feb 2, 2017 by Octave_J._Orgeron (1,520 points)   1
0 votes

On 02/02/2017 02:16 PM, Octave J. Orgeron wrote:
Hi Doug,

Comments below..

Thanks,
Octave

On 2/2/2017 11:27 AM, Doug Hellmann wrote:

It sounds like part of the plan is to use the configuration setting
to control how the migration scripts create tables. How will that
work? Does each migration need custom logic, or can we build helpers
into oslo.db somehow? Or will the option be passed to the database
to change its behavior transparently?

These are good questions. For each service, when the db sync or db
manage operation is done it will call into SQL Alchemy or Alembic
depending on the methods used by the given service. For example, most
use SQL Alchemy, but there are services like Ironic and Neutron that use
Alembic. It is within these scripts under the /db/* hierarchy
that the logic exist today to configure the database schema for any
given service. Both approaches will look at the schema version in the
database to determine where to start the create, upgrade, heal, etc.
operations. What my patches do is that in the scripts where a table
needs to be modified, there will be custom IF/THEN logic to check the
cfg.CONF.database.mysqlstorageengine setting to make the required
modifications. There are also use cases where the api.py or model(s).py
under the /db/ hierarchy needs to look at this setting as well
for API and CLI operations where mysqlengine is auto-inserted into DB
operations. In those use cases, I replace the hard coded "InnoDB" with
the mysql
storage_engine variable.

can you please clarify "replace the hard coded "InnoDB" " ? Are you
proposing to send reviews for patches against all occurrences of
"InnoDB" in files like
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/migrate_repo/versions/216_havana.py
? The "InnoDB" keyword is hardcoded in hundreds of migration files
across all openstack projects that use MySQL. Are all of these going
to be patched with some kind of conditional?

It would be interesting if we could develop some helpers to automate
this, but it would probably have to be at the SQL Alchemy or Alembic
levels.

not really, you can build a hook that intercepts operations like
CreateTable, or that intercepts SQL as it is emitted over a connection,
in order to modify these values on the fly. But that is a specific kind
of approach with it's own set of surprises. Alternatively you can make
an alternate SQLAlchemy dialect that no longer recognizes "mysql_*" as
the prefix for these arguments. There's ways to do this part.

But more critically I noticed you referred to altering the names of
columns to suit NDB. How will this be accomplished? Changing a column
name in an openstack application is no longer trivial, because online
upgrades must be supported for applications like Nova and Neutron. A
column name can't just change to a new name, both columns have to exist
and logic must be added to keep these columns synchronized.

Unfortunately, throughout all of the OpenStack services today we
are hard coding things like mysql_engine, using InnoDB specific features
(savepoints, nested operations, etc.), and not following the strict SQL
orders for modifying table elements (foreign keys, constraints, and
indexes).

Savepoints aren't InnoDB specific, they are a standard SQL feature and
also their use is not widespread right now. I'm not sure what you mean
by "the strict SQL orders", we use ALTER TABLE as is standard in MySQL
for this and it's behind an abstraction layer that supports other
databases such as Postgresql.

  • Many of the SQL Alchemy and Alembic scripts only import the minimal
    set of python modules. If we imported others, we would also have to
    initialize those name spaces which means a lot more code :(

I'm not sure what this means, can you clarify ?

* Reduces the amount of overhead required to make these changes.

What sort of "overhead", do you mean code complexity, performance ?

Keep in mind that we do not encourage code outside of libraries to
rely on configuration settings defined within libraries, because
that limits our ability to change the names and locations of the
configuration variables. If migration scripts need to access the
configuration setting we will need to add some sort of public API
to oslo.db to query the value. The function can simply return the
configured value.

Configuration parameters within any given service will make use of a
large namespace that pulls in things from oslo and the .conf files for a
given service. So even when an API, CLI, or DB related call is made,
these namespaces are key for things to work. In the case of the SQL
Alchemy and Alembic scripts, they also make use of this namespace with
oslo, oslo.db, etc. to figure out how to connect to the database and
other database settings. I don't think we need a public API for these
kinds of calls as the community already makes use of the libraries to
build the namespace. My oslo.db setting and patches for each service
just make use of the cfg.CONF.database namespace to determine the
correct behavior to execute.

What other behaviors are likely to be changed by the new option?
Will application runtime behavior need to know about the storage
engine?

The changes will be transparent to the application runtime behavior. The
APIs and CLI tools call into the /db/api.py as the entry point
for database calls. Behind this you usually have a models.py that is
aware of the database schema to understand the layout of things. So the
underlining structure is abstracted away from the run-time. These entry
points sometimes do require minor modifications to handle any hard coded
issues or intercept functions like savepoints and nested operations.
Again I use the cfg.CONF.database namespace to check for the appropriate
behavior and implement IF/THEN logic to do the right thing.

Some of my design objectives for all of these patches are:

  • Zero impact on OpenStack functionality and usability (API, CLI, user
    experience)
  • No loss in database structure. Consistent foreign keys, constraints,
    indexes, etc.
  • Minimal impact on column size and/or types to fit within NDB table
    row limits. Many columns are over-sized today.
  • Validate functionality of APIs, service processes, and CLI. Tempest
    is our friend :)
  • Zero impact for users not using MySQL Cluster (NDB).

Doug

Thanks,
Octave

On 2/2/2017 6:46 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-01 20:33:38 -0700:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  • Row character length limits 65k -> 14k
  • Proper SQL ordering of foreign key, constraints, and index operations
  • Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Thanks,
Octave

Is it possible to detect the storage engine at runtime, instead of
having the operator configure it?

Doug


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--

Oracle
Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550 <tel:+17206161550>
500 Eldorado Blvd. | Broomfield, CO 80021
Certified Oracle Enterprise Architect: Systems Infrastructure

Green Oracle Oracle is committed to
developing practices and products that help protect the environment


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 2, 2017 by Mike_Bayer (15,260 points)   1 5 6
0 votes

Excerpts from Octave J. Orgeron's message of 2017-02-02 12:16:15 -0700:

Hi Doug,

Comments below..

Thanks,
Octave

On 2/2/2017 11:27 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-02 09:40:23 -0700:

Hi Doug,

One could try to detect the default engine. However, in MySQL Cluster,
you can support multiple storage engines. Only NDB is fully clustered
and replicated, so if you accidentally set a table to be InnoDB it won't
be replicated . So it makes more sense for the operator to be explicit
on which engine they want to use.
I think this change is probably a bigger scale item than I understood
it to be when you originally contacted me off-list for advice about
how to get started. I hope I haven't steered you too far wrong, but
at least the conversation is started.

As someone (Mike?) pointed out on the review, the option by itself
doesn't do much of anything, now. Before we add it, I think we'll
want to see some more detail about how it's going used. It may be
easier to have that broader conversation here on email than on the
patch currently up for review.

Understood, it's a complicated topic since it involves gritty details in
SQL Alchemy and Alembic that are masked from end-users and operators
alike. Figuring out how to make this work did take some time on my part.

It sounds like part of the plan is to use the configuration setting
to control how the migration scripts create tables. How will that
work? Does each migration need custom logic, or can we build helpers
into oslo.db somehow? Or will the option be passed to the database
to change its behavior transparently?

These are good questions. For each service, when the db sync or db
manage operation is done it will call into SQL Alchemy or Alembic
depending on the methods used by the given service. For example, most
use SQL Alchemy, but there are services like Ironic and Neutron that use
Alembic. It is within these scripts under the /db/* hierarchy
that the logic exist today to configure the database schema for any
given service. Both approaches will look at the schema version in the
database to determine where to start the create, upgrade, heal, etc.
operations. What my patches do is that in the scripts where a table
needs to be modified, there will be custom IF/THEN logic to check the
cfg.CONF.database.mysqlstorageengine setting to make the required
modifications. There are also use cases where the api.py or model(s).py
under the /db/ hierarchy needs to look at this setting as well
for API and CLI operations where mysqlengine is auto-inserted into DB
operations. In those use cases, I replace the hard coded "InnoDB" with
the mysql
storage_engine variable.

So all existing scripts that create or modify tables will need to
be updated? That's going to be a lot of work. It will also be a lot
of work to ensure that new alter scripts are implemented using the
required logic, and that testing happens in the gates for all
projects supporting this feature to ensure there are no regressions
or behavioral changes in the applications as a result of the changes
in table definitions.

I'll let the folks more familiar with databases in general and MySQL
in particular respond to some of the technical details, but I think
I should give you fair warning that you're taking on a very big
project, especially for someone new to the community.

It would be interesting if we could develop some helpers to automate
this, but it would probably have to be at the SQL Alchemy or Alembic
levels. Unfortunately, throughout all of the OpenStack services today we
are hard coding things like mysql_engine, using InnoDB specific features
(savepoints, nested operations, etc.), and not following the strict SQL
orders for modifying table elements (foreign keys, constraints, and
indexes). That actually makes it difficult to support other MySQL
dialects or other databases out of the box. SQL Alchemy can be used to
fix some of these things if the SQL statements are all generic and we
follow strict SQL rules. But to change that would be a monumental
effort. That is why I took this approach of just adding custom logic.
There is a president for this already for Postgres and DB2 support in
some of the OpenStack services using custom logic to deal with similar
differences.

As to why we should place the configuration setting into oslo.db? Here
are a couple of logical reasons:

Oh, I'm not questioning putting the option in oslo.db. I think that's
clearly the right place to put it.

  • The configuration block for database settings for each service comes
    from the oslo.db namespace today under cfg.CONF.database.*. Placing
    it here makes the location consistent across all of the services.
  • Within the SQL Alchemy and Alembic scripts, this is one of the few
    common namespaces that are available without bringing in a larger
    number of modules across the services today.
  • Many of the SQL Alchemy and Alembic scripts only import the minimal
    set of python modules. If we imported others, we would also have to
    initialize those name spaces which means a lot more code :(
  • Reduces the amount of overhead required to make these changes.

Keep in mind that we do not encourage code outside of libraries to
rely on configuration settings defined within libraries, because
that limits our ability to change the names and locations of the
configuration variables. If migration scripts need to access the
configuration setting we will need to add some sort of public API
to oslo.db to query the value. The function can simply return the
configured value.

Configuration parameters within any given service will make use of a
large namespace that pulls in things from oslo and the .conf files for a
given service. So even when an API, CLI, or DB related call is made,
these namespaces are key for things to work. In the case of the SQL
Alchemy and Alembic scripts, they also make use of this namespace with
oslo, oslo.db, etc. to figure out how to connect to the database and
other database settings. I don't think we need a public API for these
kinds of calls as the community already makes use of the libraries to
build the namespace. My oslo.db setting and patches for each service
just make use of the cfg.CONF.database namespace to determine the
correct behavior to execute.

I may not have been entirely clear. You need to add a function to
oslo.db to allow a user of oslo.db to read the configuration value
without knowing what that option name is. There are two reasons for
this policy:

  1. Configuration options are supposed to be completely transparent
    to the application developer using the library, otherwise they
    would be parameters to the classes or functions in the library
    instead of deployer-facing configuration options.

    oslo.config allows us to rename configuration options transparently
    to deployers (they get a warning about the new name or location
    for the option in the config file, but the library knows both
    locations).

    The rename feature does not work when accessing options
    programmatically, because we do not consider configuration options
    to be part of the API of a library. That means that cfg.CONF.foo.bar
    can move to cfg.CONF.blah.bletch, and your code using it by the
    old name will break.

  2. Accessing configuration options depends on having them registered,
    and a user of the library that owns a configuration option may not
    know which functions in the library to call to register the options.
    As a result, they may try to use an option before it is actually
    defined. Using an access function to read the value of an option
    allows the library to ensure the option is registered before trying
    to return the value.

For those reasons, in cases where a configuration option needs to
be exposed outside of the library we require a function defined
inside the library where we can have unit tests that will break if
the configuration option is renamed or otherwise changed, and so
we can handle those changes without breaking applications consuming
the library.

In this case, the migration scripts are outside of oslo.db, so they
will need a public function added to oslo.db to access the configuration
value. The function should first ensure that the new option is
registered, and then return the configured value.

What other behaviors are likely to be changed by the new option?
Will application runtime behavior need to know about the storage
engine?

The changes will be transparent to the application runtime behavior. The
APIs and CLI tools call into the /db/api.py as the entry point
for database calls. Behind this you usually have a models.py that is
aware of the database schema to understand the layout of things. So the
underlining structure is abstracted away from the run-time. These entry
points sometimes do require minor modifications to handle any hard coded
issues or intercept functions like savepoints and nested operations.
Again I use the cfg.CONF.database namespace to check for the appropriate
behavior and implement IF/THEN logic to do the right thing.

Some of my design objectives for all of these patches are:

  • Zero impact on OpenStack functionality and usability (API, CLI, user
    experience)
  • No loss in database structure. Consistent foreign keys, constraints,
    indexes, etc.
  • Minimal impact on column size and/or types to fit within NDB table
    row limits. Many columns are over-sized today.
  • Validate functionality of APIs, service processes, and CLI. Tempest
    is our friend :)
  • Zero impact for users not using MySQL Cluster (NDB).

Doug

Thanks,
Octave

On 2/2/2017 6:46 AM, Doug Hellmann wrote:

Excerpts from Octave J. Orgeron's message of 2017-02-01 20:33:38 -0700:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysqlstorageengine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

* Row character length limits 65k -> 14k
* Proper SQL ordering of foreign key, constraints, and index operations
* Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970

Thanks,
Octave

Is it possible to detect the storage engine at runtime, instead of
having the operator configure it?

Doug


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 2, 2017 by Doug_Hellmann (87,520 points)   3 4 8
...