settingsLogin | Registersettings

[openstack-dev] [all] Replace mysql-python with mysqlclient

0 votes

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column).

The MySQL database is popular, but the Python driver mysql-python doesn't look to be maintained anymore. The latest commit was done in january 2014, before the release of MySQL-python 1.2.5:

https://github.com/farcepest/MySQLdb1/commits/master

One major issue is that mysql-python doesn't support Python 3. It blocks porting most OpenStack applications to Python 3. There are now 32 open issues and 25 pending pull requests. I also sent an email to Andy Dustman (aka farcepest) last week, but I didn't get any reply yet.

There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation

Naoki INADA, the PyMySQL maintainer, forked mysql-python as the new project "mysqlclient". Quote of his email (part of long thread "[openstack-dev] [oslo] eventlet 0.17.3 is now fully Python 3 compatible"):

"""
I'm maintainer of PyMySQL and mysqlclient.

mysqlclient is fork of MySQL-python. It uses libmysqlclient.so.
It fixes some bugs, build issues and it support Python 3. For example:

  • Support MariaDB's libmysqlclient.so
  • Support microsecond in TIME column

I recommend to use mysqlclient instead of MySQL-python even on Python 2.

https://pypi.python.org/pypi/mysqlclient
https://github.com/PyMySQL/mysqlclient-python
"""

Since mysqlclient is fork, it should have no impact on performances.

On PyPI, mysql-python and mysqlclient have a different name, but the Python module has the same name ("MySQLdb"). OpenStack code doesn't need to be modified, only dependencies.

mysqlclient is also tested in the "PyMySQL evaluation".

mysqlclient shares mysql-python drawbacks. It is implemented in C and it is not eventlet friendly (cannot be monkey-patched). A workaround is to run it in a thread or a thread pool.

I want to replace mysql-python with mysqlclient to get Python 3 compatibility in short term. We can reconsider PyMySQL later for other advantages like the ability to monkey-patch it. At the same time, there are also a deeper discussion to change how OpenStack handles concurrency (replace eventlet with threads, asyncio or something else):
https://review.openstack.org/#/c/164035/

Victor


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Apr 30, 2015 in openstack-dev by vstinner_at_redhat.c (5,420 points)   1 5 7

43 Responses

0 votes

There is an open discussion to replace mysql-python with PyMySQL, but
PyMySQL has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation

My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move in the
direction of cellsv2 in nova, not blocking the process while waiting for
a reply from mysql is going to be critical. Further, I think that we're
likely to get back a lot of performance from a supports-eventlet
database connection because of the parallelism that conductor currently
can only provide in exchange for the footprint of forking into lots of
workers.

If we're going to move, shouldn't we be looking at something that
supports our threading model?

--Dan


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 30, 2015 by Dan_Smith (9,860 points)   1 2 4
0 votes

If we're going to move, shouldn't we be looking at something that
supports our threading model?

I would prefer to make baby steps, and first fix the Python 3 compatibility.

Enhance concurrency/parallelism is a much more complex project than just replacing a single line in dependencies ;-)

See my email, I mentioned a workaround for mysqlclient and a spec discussing a more general solution for concurrency.

Victor


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 30, 2015 by vstinner_at_redhat.c (5,420 points)   1 5 7
0 votes

On 4/30/15 11:00 AM, Victor Stinner wrote:
Hi,

I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column).

It is not feasible to use MySQLclient in Python 2 because it uses the
same module name as Python-MySQL, and would wreak havoc with distro
packaging and many other things. It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary. There should be no
reason Openstack applications are hardcoded to one database driver.
The approach should be simply that in Python 3, the mysqlclient library
is installed instead of mysql-python. MySQLclient installs under the
same name, so in this case there isn't even any change to the SQLAlchemy
URL required.

The MySQL database is popular, but the Python driver mysql-python doesn't look to be maintained anymore. The latest commit was done in january 2014, before the release of MySQL-python 1.2.5:

https://github.com/farcepest/MySQLdb1/commits/master

One major issue is that mysql-python doesn't support Python 3. It blocks porting most OpenStack applications to Python 3. There are now 32 open issues and 25 pending pull requests. I also sent an email to Andy Dustman (aka farcepest) last week, but I didn't get any reply yet.

There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation

PyMySQL is monkeypatchable, so as long as we are using eventlet, it is
insane that we are using MySQL-Python at all, because it is actively
making openstack applications perform much much more poorly than if we
just removed eventlet. So as long as eventlet is running, PyMySQL
wins the performance argument hands down (as described at the link
http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/
which is in the third paragraph of that wiki page). And it's Py3k
compatible.

The performance results in that wiki page are also out of date. Naoki
INADA has merged several performance improvements since then.

My ultimate setup would still use mysql-python Py2K / MySQLclient Py3K,
and Openstack applications would again use traditional threads for
database APIs. But that is two changes.

so to sum up:

  1. keep Mysql-python on Py2K, use mysqlclient on py3k, changing the
    implementation of the "MySQLdb" module on Py2K, server-wide, would be
    very disruptive

  2. if we actually care about performance, we either A. dump eventlet or
    B. use pymysql. All other performance arguments are moot right now as
    we are in the basement.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 30, 2015 by Mike_Bayer (15,260 points)   1 6 8
0 votes

On 4/30/15 11:16 AM, Dan Smith wrote:

There is an open discussion to replace mysql-python with PyMySQL, but
PyMySQL has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation
My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move in the
direction of cellsv2 in nova, not blocking the process while waiting for
a reply from mysql is going to be critical. Further, I think that we're
likely to get back a lot of performance from a supports-eventlet
database connection because of the parallelism that conductor currently
can only provide in exchange for the footprint of forking into lots of
workers.

If we're going to move, shouldn't we be looking at something that
supports our threading model?
yes, but at the same time, we should change our threading model at the
level of where APIs are accessed to refer to a database, at the very
least using a threadpool behind eventlet. CRUD-oriented database
access is faster using traditional threads, even in Python, than using
an eventlet-like system or using explicit async. The tests at
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
show this. With traditional threads, we can stay on the C-based MySQL
APIs and take full advantage of their speed.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 30, 2015 by Mike_Bayer (15,260 points)   1 6 8
0 votes

Hi,

Mike Bayer wrote:
It is not feasible to use MySQLclient in Python 2 because it uses the
same module name as Python-MySQL, and would wreak havoc with distro
packaging and many other things.

IMO mysqlclient is just the new upstream for MySQL-Python, since MySQL-Python is no more maintained.

Why Linux distributions would not package mysqlclient if it provides Python 3 support, contains bugfixes and more features?

It's quite common to have two packages in conflicts beceause they provide the same function, same library, same program, etc.

I would even suggest packagers to use mysqlclient as the new source without modifying their package.

It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary.

Why do you consider that mysqlclient is not tested or less tested than mysql-python? Which kind of regression do you expect in mysqlclient?

As mysql-python, mysqlclient Github project is connected to Travis:
https://travis-ci.org/PyMySQL/mysqlclient-python
(tests pass)

I trust more a project which is actively developed.

There should be no
reason Openstack applications are hardcoded to one database driver.
The approach should be simply that in Python 3, the mysqlclient library
is installed instead of mysql-python.

Technically, it's now possible to have different dependencies on Python 2 and Python 3. But in practice, there are some annoying corner cases. It's more convinient to have same dependencies on Python 2 and Python 3.

Using mysqlclient on Python 2 and Python 3 would avoid to have bugs specific to Python 2 (bugs already fixed in mysqlclient) and new features only available on Python 3.

Victor


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 4, 2015 by vstinner_at_redhat.c (5,420 points)   1 5 7
0 votes

I propose to replace mysql-python with mysqlclient in OpenStack applications
to get Python 3 support, bug fixes and some new features (support MariaDB's
libmysqlclient.so, support microsecond in TIME column).

I just proposed a change to add mysqlclient dependency to global requirements:

https://review.openstack.org/#/c/179745/

Victor


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 4, 2015 by vstinner_at_redhat.c (5,420 points)   1 5 7
0 votes

On 04/30/2015 07:48 PM, Mike Bayer wrote:

On 4/30/15 11:00 AM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack
applications to get Python 3 support, bug fixes and some new features
(support MariaDB's libmysqlclient.so, support microsecond in TIME
column).

It is not feasible to use MySQLclient in Python 2 because it uses the
same module name as Python-MySQL, and would wreak havoc with distro
packaging and many other things.

I don't see what it would break. If I do:

Package: python-mysqlclient
Breaks: python-mysqldb
Replaces: python-mysqldb
Provides: python-mysqldb

everything is fine, and python-mysqlclient becomes another
implementation of the same thing. Then I believe it'd be a good idea to
simply remove python-mysqldb from Debian, since it's not maintained
upstream anymore.

It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary.

Supporting Python 3 is necessary, as we are going to remove Python 2
from Debian from Buster.

There should be no
reason Openstack applications are hardcoded to one database driver.

If they share the same "import mysqldb", and if they are API compatible,
how is this a problem?

The
approach should be simply that in Python 3, the mysqlclient library is
installed instead of mysql-python.

So, in Python 3, we'd have some bugfixes, and not in Python 2? This
seems a very weird approach to me, which will lead to lots of issues.

MySQLclient installs under the same
name, so in this case there isn't even any change to the SQLAlchemy URL
required.

Nor there should be in anything else, if they are completely API compatible.

PyMySQL is monkeypatchable, so as long as we are using eventlet, it is
insane that we are using MySQL-Python at all, because it is actively
making openstack applications perform much much more poorly than if we
just removed eventlet. So as long as eventlet is running, PyMySQL
wins the performance argument hands down (as described at the link
http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/
which is in the third paragraph of that wiki page). And it's Py3k
compatible.

Ok, so you are for switching to pymysql. Good. But is this realistic?
Are you going to provide yourself all the patches for absolutely all
projects of OpenStack that is using python-mysqldb?

  1. keep Mysql-python on Py2K, use mysqlclient on py3k, changing the
    implementation of the "MySQLdb" module on Py2K, server-wide, would be
    very disruptive

I'm sorry to say it this way, because I respect you a lot and you did a
lot of very good things. But Mike, this is a very silly idea. We are
already having difficulties to push support for Py3, and in some cases,
it's hard to deal with the differences. Now, you want to add even more
source of problems, with bugs specific to Py2 or Py3 implementation? Why
should we make our life even more miserable? I completely fail to
understand what we would try to achieve by doing this.

  1. if we actually care about performance, we either A. dump eventlet or
    B. use pymysql. All other performance arguments are moot right now as
    we are in the basement.

Eventlet has to die, we all know it. Not only for performances reason.
But this is completely orthogonal to the discussion we're having about
having Python 3 support. Please don't stand on the way to do it, just
because we have other (unrelated) issues with Eventlet + MySQL.

Switching to mysqlclient is basically almost "free" (by that, I mean
effortless), if I understand what Victor wrote. The same thing can't be
said of removing Eventlet or switching to pymysql, even though if both
may be needed. So why add the later as a blocker for the former?

Cheers,

Thomas Goirand (zigo)


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 4, 2015 by Thomas_Goirand (18,640 points)   3 11 16
0 votes

On 04/30/2015 05:00 PM, Victor Stinner wrote:
Hi,

I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column).

In fact, when looking at the python-mysqldb package description in
Debian, I can see:

Mysqlclient is an interface to the popular MySQL database server for
Python.
.
This is a fork of MySQLdb. It add Python 3.3 support and merges some
pull requests.

Then I saw this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096

The package is currently only in Debian experimental, but I am betting
that "soon", the new python-mysqldb package will be uploaded to Sid, and
it's very likely that Ubuntu will follow (and sync the package from Debian).

As a consequence, I think it'd be much better that OpenStack follows
that and use the same thing as distributions. I of course don't know
what Fedora will do, but maybe they may follow the trend...

Also, I've been using that fork without realizing it, and as much as I
can tell, OpenStack continues to work...

Cheers,

Thomas Goirand (zigo)


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 5, 2015 by Thomas_Goirand (18,640 points)   3 11 16
0 votes

On 5/4/15 6:48 PM, Thomas Goirand wrote:
I don't see what it would break. If I do:

Package: python-mysqlclient
Breaks: python-mysqldb
Replaces: python-mysqldb
Provides: python-mysqldb

everything is fine, and python-mysqlclient becomes another
implementation of the same thing. Then I believe it'd be a good idea
to simply remove python-mysqldb from Debian, since it's not maintained
upstream anymore.

It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary.

Supporting Python 3 is necessary, as we are going to remove Python 2
from Debian from Buster.
I don't know debian but the approach would be that something like the
"mysqlclient-py3k" package applies to Python 3 only.

There should be no
reason Openstack applications are hardcoded to one database driver.

If they share the same "import mysqldb", and if they are API
compatible, how is this a problem?
how do you know they are API compatible? This is in fact exactly where
this approach can become a huge problem. No MySQL drivers I've ever
used are fully API compatible with any of the other ones. all of them
have subtle and not-so-subtle differences in behavior. That mysqlclient
is now a fork means it will begin to diverge, and as issues come up to
which their resolution requires even more subtle or not-so-subtle
changes in behavior, these differences will only continue to grow.

From a SQLAlchemy perspective this would be much easier to maintain as
a new sub-dialect. I've proposed that they change their name:
https://github.com/PyMySQL/mysqlclient-python/issues/44 . However, the
maintainers are not going for it, so I guess that isn't going to happen.

The
approach should be simply that in Python 3, the mysqlclient library is
installed instead of mysql-python.

So, in Python 3, we'd have some bugfixes, and not in Python 2? This
seems a very weird approach to me, which will lead to lots of issues.
I've asked three times now to please show the bugfixes that are
needed. Show me the issues that aren't being fixed, and then I will
be convinced and begin the process of pushing here at Red Hat to make
the same packaging changes such that our customers will no longer be
able to use the original MySQLdb. We're talking about an instant,
systemwide replacement of one MySQLdb implementation for another and I
just think that is high risk.

B. use pymysql. All other performance arguments are moot right now as
we are in the basement.

Eventlet has to die, we all know it. Not only for performances reason.
But this is completely orthogonal to the discussion we're having about
having Python 3 support. Please don't stand on the way to do it, just
because we have other (unrelated) issues with Eventlet + MySQL.

Switching to mysqlclient is basically almost "free" (by that, I mean
effortless), if I understand what Victor wrote. The same thing can't
be said of removing Eventlet or switching to pymysql, even though if
both may be needed. So why add the later as a blocker for the former?
Well, switching to pymysql is just as effortless IMHO, and in fact
more effortless because it can be done impacting only individual
applications at a time, rather than forcing it on everything at once.
SQLAlchemy has a dialect for PyMySQL already which is well
maintained and well tested. We change the database URL in projects to
include "mysql+pymysql", update requirements.txt, distros add their
packages like they have to anyway, and we're done. From my view, if
we're going to switch DBAPIs then PyMySQL would be it - if we're going
for "bug fixes in the DBAPI", the "doesn't support eventlet" is the
biggest bug.

But again, I really want to see what the critical issues in MySQLdb are
that are holding us back. If there are really fixes and features we
need in Py2K then of course we have to either convince MySQLdb to merge
them or switch to mysqlclient. At the moment though I need to see the
evidence for me to really buy this argument.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 5, 2015 by Mike_Bayer (15,260 points)   1 6 8
0 votes

On 5/5/15 1:11 PM, Thomas Goirand wrote:

On 04/30/2015 05:00 PM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack
applications to get Python 3 support, bug fixes and some new features
(support MariaDB's libmysqlclient.so, support microsecond in TIME
column).

In fact, when looking at the python-mysqldb package description in
Debian, I can see:

Mysqlclient is an interface to the popular MySQL database server for
Python.
.
This is a fork of MySQLdb. It add Python 3.3 support and merges some
pull requests.

Then I saw this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096

Wow, the thread decides to go forward with the move based on incorrect
information. MySQL-Python's last release was on Jan 2, 2014, not in
2010. They are looking at the entirely wrong repository.

Andy Dustman is a real person who is easily locatable on many services
including Twitter, Linkedin, Github, etc. Any chance that anyone
has tried to get a comment from him on this, given that with the Django
recommendation and the distro package moves, his package is about to be
more or less wiped out of most major distributions? It just would be
good style IMHO.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 5, 2015 by Mike_Bayer (15,260 points)   1 6 8
...