settingsLogin | Registersettings

[openstack-dev] [Keystone][Fernet] HA SQL backend for Fernet keys

0 votes

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for a
filesystem.

--
Kind Regards,
Alexander Makarov,
Senior Software Developer,

Mirantis, Inc.
35b/3, Vorontsovskaya St., 109147, Moscow, Russia

Tel.: +7 (495) 640-49-04
Tel.: +7 (926) 204-50-60

Skype: MAKAPOB.AJIEKCAHDP


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jul 27, 2015 in openstack-dev by Alexander_V_Makarov (900 points)   1 2

35 Responses

0 votes

Although using a node's local filesystem requires external configuration
management to manage the distribution of rotated keys, it's always
available, easy to secure, and can be updated atomically per node. Note
that Fernet's rotation strategy uses a staged key that can be distributed
to all nodes in advance of it being used to create new tokens.

Also be aware that you wouldn't want to store encryption keys in plaintext
in a shared database, so you must introduce an additional layer of
complexity to solve that problem.

Barbican seems like much more logical next-step beyond the local
filesystem, as it shifts the burden onto a system explicitly designed to
handle this issue (albeit in a multitenant environment).

On Mon, Jul 27, 2015 at 12:01 PM, Alexander Makarov amakarov@mirantis.com
wrote:

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for a
filesystem.

--
Kind Regards,
Alexander Makarov,
Senior Software Developer,

Mirantis, Inc.
35b/3, Vorontsovskaya St., 109147, Moscow, Russia

Tel.: +7 (495) 640-49-04
Tel.: +7 (926) 204-50-60

Skype: MAKAPOB.AJIEKCAHDP


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Dolph_Mathews (9,560 points)   1 2 3
0 votes

Excerpts from Alexander Makarov's message of 2015-07-27 10:01:34 -0700:

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for a
filesystem.

I don't think HA is the root of the problem here. The problem is
synchronization. If I have 3 keystone servers (n+1), and I rotate keys on
them, I must very carefully restart them all at the exact right time to
make sure one of them doesn't issue a token which will not be validated
on another. This is quite a real possibility because the validation
will not come from the user, but from the service, so it's not like we
can use simple persistence rules. One would need a layer 7 capable load
balancer that can find the token ID and make sure it goes back to the
server that issued it.

A database will at least ensure that it is updated in one place,
atomically, assuming each server issues a query to find the latest
key at every key validation request. That would be a very cheap query,
but not free. A cache would be fine, with the cache being invalidated
on any failed validation, but then that opens the service up to DoS'ing
the database simply by throwing tons of invalid tokens at it.

So an alternative approach is to try to reload the filesystem based key
repository whenever a validation fails. This is quite a bit cheaper than a
SQL query, so the DoS would have to be a full-capacity DoS (overwhelming
all the nodes, not just the database) which you can never prevent. And
with that, you can simply sync out new keys at will, and restart just
one of the keystones, whenever you are confident the whole repository is
synchronized. This is also quite a bit simpler, as one basically needs
only to add a single piece of code that issues load_keys and retries
inside validation.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Clint_Byrum (40,940 points)   4 6 10
0 votes

Barbican depends on Keystone though for authentication. Its not a silver bullet here.

Kevin


From: Dolph Mathews [dolph.mathews@gmail.com]
Sent: Monday, July 27, 2015 10:53 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Keystone][Fernet] HA SQL backend for Fernet keys

Although using a node's local filesystem requires external configuration management to manage the distribution of rotated keys, it's always available, easy to secure, and can be updated atomically per node. Note that Fernet's rotation strategy uses a staged key that can be distributed to all nodes in advance of it being used to create new tokens.

Also be aware that you wouldn't want to store encryption keys in plaintext in a shared database, so you must introduce an additional layer of complexity to solve that problem.

Barbican seems like much more logical next-step beyond the local filesystem, as it shifts the burden onto a system explicitly designed to handle this issue (albeit in a multitenant environment).

On Mon, Jul 27, 2015 at 12:01 PM, Alexander Makarov amakarov@mirantis.com wrote:
Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption keys stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated keys in HA environment.
Now Fernet keys are stored in the filesystem that has some availability issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for a filesystem.

--
Kind Regards,
Alexander Makarov,
Senior Software Developer,

Mirantis, Inc.
35b/3, Vorontsovskaya St., 109147, Moscow, Russia

Tel.: +7 (495) 640-49-04<tel:%2B7%20%28495%29%20640-49-04>
Tel.: +7 (926) 204-50-60<tel:%2B7%20%28926%29%20204-50-60>

Skype: MAKAPOB.AJIEKCAHDP


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Fox,_Kevin_M (29,360 points)   1 3 4
0 votes

On Mon, Jul 27, 2015 at 1:31 PM, Clint Byrum clint@fewbar.com wrote:

Excerpts from Alexander Makarov's message of 2015-07-27 10:01:34 -0700:

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated
keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for a
filesystem.

I don't think HA is the root of the problem here. The problem is
synchronization. If I have 3 keystone servers (n+1), and I rotate keys on
them, I must very carefully restart them all at the exact right time to
make sure one of them doesn't issue a token which will not be validated
on another. This is quite a real possibility because the validation
will not come from the user, but from the service, so it's not like we
can use simple persistence rules. One would need a layer 7 capable load
balancer that can find the token ID and make sure it goes back to the
server that issued it.

This is not true (or if it is, I'd love see a bug report). keystone-manage
fernet_rotate uses a three phase rotation strategy (staged -> primary ->
secondary) that allows you to distribute a staged key (used only for token
validation) throughout your cluster before it becomes a primary key (used
for token creation and validation) anywhere. Secondary keys are only used
for token validation.

All you have to do is atomically replace the fernet key directory with a
new key set.

You also don't have to restart keystone for it to pickup new keys dropped
onto the filesystem beneath it.

A database will at least ensure that it is updated in one place,
atomically, assuming each server issues a query to find the latest
key at every key validation request. That would be a very cheap query,
but not free. A cache would be fine, with the cache being invalidated
on any failed validation, but then that opens the service up to DoS'ing
the database simply by throwing tons of invalid tokens at it.

So an alternative approach is to try to reload the filesystem based key
repository whenever a validation fails. This is quite a bit cheaper than a
SQL query, so the DoS would have to be a full-capacity DoS (overwhelming
all the nodes, not just the database) which you can never prevent. And
with that, you can simply sync out new keys at will, and restart just
one of the keystones, whenever you are confident the whole repository is
synchronized. This is also quite a bit simpler, as one basically needs
only to add a single piece of code that issues load_keys and retries
inside validation.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Dolph_Mathews (9,560 points)   1 2 3
0 votes

Excerpts from Dolph Mathews's message of 2015-07-27 11:48:12 -0700:

On Mon, Jul 27, 2015 at 1:31 PM, Clint Byrum clint@fewbar.com wrote:

Excerpts from Alexander Makarov's message of 2015-07-27 10:01:34 -0700:

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated
keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for a
filesystem.

I don't think HA is the root of the problem here. The problem is
synchronization. If I have 3 keystone servers (n+1), and I rotate keys on
them, I must very carefully restart them all at the exact right time to
make sure one of them doesn't issue a token which will not be validated
on another. This is quite a real possibility because the validation
will not come from the user, but from the service, so it's not like we
can use simple persistence rules. One would need a layer 7 capable load
balancer that can find the token ID and make sure it goes back to the
server that issued it.

This is not true (or if it is, I'd love see a bug report). keystone-manage
fernet_rotate uses a three phase rotation strategy (staged -> primary ->
secondary) that allows you to distribute a staged key (used only for token
validation) throughout your cluster before it becomes a primary key (used
for token creation and validation) anywhere. Secondary keys are only used
for token validation.

All you have to do is atomically replace the fernet key directory with a
new key set.

You also don't have to restart keystone for it to pickup new keys dropped
onto the filesystem beneath it.

That's great news! Is this documented anywhere? I dug through the
operators guides, security guide, install guide, etc. Nothing described
this dance, which is impressive and should be written down!

I even tried to discern how it worked from the code but it actually
looks like it does not work the way you describe on casual investigation.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Clint_Byrum (40,940 points)   4 6 10
0 votes

On Mon, Jul 27, 2015 at 2:03 PM, Clint Byrum clint@fewbar.com wrote:

Excerpts from Dolph Mathews's message of 2015-07-27 11:48:12 -0700:

On Mon, Jul 27, 2015 at 1:31 PM, Clint Byrum clint@fewbar.com wrote:

Excerpts from Alexander Makarov's message of 2015-07-27 10:01:34 -0700:

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption
keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing rotated
keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some
availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that for
a
filesystem.

I don't think HA is the root of the problem here. The problem is
synchronization. If I have 3 keystone servers (n+1), and I rotate keys
on
them, I must very carefully restart them all at the exact right time to
make sure one of them doesn't issue a token which will not be validated
on another. This is quite a real possibility because the validation
will not come from the user, but from the service, so it's not like we
can use simple persistence rules. One would need a layer 7 capable load
balancer that can find the token ID and make sure it goes back to the
server that issued it.

This is not true (or if it is, I'd love see a bug report).
keystone-manage
fernet_rotate uses a three phase rotation strategy (staged -> primary ->
secondary) that allows you to distribute a staged key (used only for
token
validation) throughout your cluster before it becomes a primary key (used
for token creation and validation) anywhere. Secondary keys are only used
for token validation.

All you have to do is atomically replace the fernet key directory with a
new key set.

You also don't have to restart keystone for it to pickup new keys dropped
onto the filesystem beneath it.

That's great news! Is this documented anywhere? I dug through the
operators guides, security guide, install guide, etc. Nothing described
this dance, which is impressive and should be written down!

(BTW, your original assumption would normally have been an accurate one!)

I don't believe it's documented in any of those places, yet. The best
explanation of the three phases in tree I'm aware of is probably this
(which isn't particularly accessible..):

https://github.com/openstack/keystone/blob/6a6fcc2/keystone/cmd/cli.py#L208-L223

Lance Bragstad and I also gave a small presentation at the Vancouver summit
on the behavior and he mentions the same on one of his blog posts:

https://www.youtube.com/watch?v=duRBlm9RtCw&feature=youtu.be
http://lbragstad.com/?p=133

I even tried to discern how it worked from the code but it actually
looks like it does not work the way you describe on casual investigation.

I don't blame you! I'll work to improve the user-facing docs on the topic.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Dolph_Mathews (9,560 points)   1 2 3
0 votes

Matt Fischer also discusses key rotation here:

http://www.mattfischer.com/blog/?p=648

And here:

http://www.mattfischer.com/blog/?p=665

On Mon, Jul 27, 2015 at 2:30 PM, Dolph Mathews dolph.mathews@gmail.com
wrote:

On Mon, Jul 27, 2015 at 2:03 PM, Clint Byrum clint@fewbar.com wrote:

Excerpts from Dolph Mathews's message of 2015-07-27 11:48:12 -0700:

On Mon, Jul 27, 2015 at 1:31 PM, Clint Byrum clint@fewbar.com wrote:

Excerpts from Alexander Makarov's message of 2015-07-27 10:01:34
-0700:

Greetings!

I'd like to discuss pro's and contra's of having Fernet encryption
keys
stored in a database backend.
The idea itself emerged during discussion about synchronizing
rotated
keys
in HA environment.
Now Fernet keys are stored in the filesystem that has some
availability
issues in unstable cluster.
OTOH, making SQL highly available is considered easier than that
for a
filesystem.

I don't think HA is the root of the problem here. The problem is
synchronization. If I have 3 keystone servers (n+1), and I rotate
keys on
them, I must very carefully restart them all at the exact right time
to
make sure one of them doesn't issue a token which will not be
validated
on another. This is quite a real possibility because the validation
will not come from the user, but from the service, so it's not like we
can use simple persistence rules. One would need a layer 7 capable
load
balancer that can find the token ID and make sure it goes back to the
server that issued it.

This is not true (or if it is, I'd love see a bug report).
keystone-manage
fernet_rotate uses a three phase rotation strategy (staged -> primary ->
secondary) that allows you to distribute a staged key (used only for
token
validation) throughout your cluster before it becomes a primary key
(used
for token creation and validation) anywhere. Secondary keys are only
used
for token validation.

All you have to do is atomically replace the fernet key directory with a
new key set.

You also don't have to restart keystone for it to pickup new keys
dropped
onto the filesystem beneath it.

That's great news! Is this documented anywhere? I dug through the
operators guides, security guide, install guide, etc. Nothing described
this dance, which is impressive and should be written down!

(BTW, your original assumption would normally have been an accurate one!)

I don't believe it's documented in any of those places, yet. The best
explanation of the three phases in tree I'm aware of is probably this
(which isn't particularly accessible..):

https://github.com/openstack/keystone/blob/6a6fcc2/keystone/cmd/cli.py#L208-L223

Lance Bragstad and I also gave a small presentation at the Vancouver
summit on the behavior and he mentions the same on one of his blog posts:

https://www.youtube.com/watch?v=duRBlm9RtCw&feature=youtu.be
http://lbragstad.com/?p=133

I even tried to discern how it worked from the code but it actually
looks like it does not work the way you describe on casual investigation.

I don't blame you! I'll work to improve the user-facing docs on the topic.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 27, 2015 by Dolph_Mathews (9,560 points)   1 2 3
0 votes

I suggest to use pacemaker multistate clone resource to rotate and rsync fernet tokens from local directories across cluster nodes. The resource prototype is described here https://etherpad.openstack.org/p/fernet_tokens_pacemaker
Pros: Pacemaker will care about CAP/split-brain stuff for us, we just design rotate and rsync logic. Also no shared FS/DB involved but only Corosync CIB - to store few internal resource state related params, not tokens.
Cons: Keystone nodes hosting fernet tokens directories must be members of pacemaker cluster. Also custom OCF script should be created to implement this.
__
Regards,
Bogdan Dobrelya.
IRC: bogdando

Matt Fischer also discusses key rotation here:

http://www.mattfischer.com/blog/?p=648

And here:

http://www.mattfischer.com/blog/?p=665

On Mon, Jul 27, 2015 at 2:30 PM, Dolph Mathews <dolph.mathews at gmail.com>
wrote:
…__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Aug 1, 2015 by Bogdan_Dobrelya (4,920 points)   1 2 7
0 votes

Meta: Bogdan, please do try to get your email client to reply with references
to the thread, so it doesn't create a new thread.

Excerpts from bdobrelia's message of 2015-08-01 09:27:17 -0700:

I suggest to use pacemaker multistate clone resource to rotate and rsync fernet tokens from local directories across cluster nodes. The resource prototype is described here https://etherpad.openstack.org/p/fernet_tokens_pacemaker
Pros: Pacemaker will care about CAP/split-brain stuff for us, we just design rotate and rsync logic. Also no shared FS/DB involved but only Corosync CIB - to store few internal resource state related params, not tokens.
Cons: Keystone nodes hosting fernet tokens directories must be members of pacemaker cluster. Also custom OCF script should be created to implement this.

This is a massive con. And there is no need for this level of complexity.

Just making sure you only ever run key rotation in one place concurrently,
followed by an rsync push to all other nodes, is a lot simpler to enact
than pacemaker.

That said, both of those solutions benefit from a feature of the keys
being in the local filesystem: it decouples the way you do HA from the way
you provide a performant service entirely.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Aug 1, 2015 by Clint_Byrum (40,940 points)   4 6 10
0 votes

On Saturday 01 August 2015 16:27:17 bdobrelia@mirantis.com wrote:
I suggest to use pacemaker multistate clone resource to rotate and
rsync
fernet tokens from local directories across cluster nodes. The resource
prototype is described here
https://etherpad.openstack.org/p/fernet_tokens_pacemaker Pros:
Pacemaker
will care about CAP/split-brain stuff for us, we just design rotate and
rsync logic. Also no shared FS/DB involved but only Corosync CIB - to
store
few internal resource state related params, not tokens. Cons: Keystone
nodes hosting fernet tokens directories must be members of pacemaker
cluster. Also custom OCF script should be created to implement this. __
Regards,
Bogdan Dobrelya.
IRC: bogdando

Looks complex.

I suggest this kind of bash or python script, running on Fuel master node:

  1. Check that all controllers are online;
  2. Go to one of the controllers, rotate keys there;
  3. Fetch key 0 from there;
  4. For each other controller rotate keys there and put the 0-key instead of
    their new 0-key.
  5. If any of the nodes fail to get new keys (because they went offline or for
    some other reason) revert the rotate (move the key with the biggest index
    back to 0).

The script can be launched by cron or by button in Fuel.

I don't see anything critically bad if one rotation/sync event fails.

Matt Fischer also discusses key rotation here:

http://www.mattfischer.com/blog/?p=648

And here:

http://www.mattfischer.com/blog/?p=665

On Mon, Jul 27, 2015 at 2:30 PM, Dolph Mathews <dolph.mathews at
gmail.com>
wrote:

--
С наилучшими пожеланиями,
Boris


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Aug 1, 2015 by Boris_Bobrov (1,720 points)   1 3
...