settingsLogin | Registersettings

[Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

0 votes

Hello,

We use a Ceph cluster for Nova (Glance and Cinder as well) and over time,
more and more data is stored there. We can't keep the cluster so big because of
Ceph's limitations. Sooner or later it needs to be closed for adding new
instances, images and volumes. Not to mention it's a big failure domain.

How do you handle this issue?
What is your strategy to divide Ceph clusters between compute nodes?
How do you solve VM snapshot placement and migration issues then
(snapshots will be left on older Ceph)?

We've been thinking about features like: dynamic Ceph configuration
(not static like in nova.conf) in Nova, pinning instances to a Ceph cluster etc.
What do you think about that?


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
asked Oct 10, 2016 in openstack-operators by Adam_Kijak (340 points)  

9 Responses

0 votes

On Mon, 2016-10-10 at 13:29 +0000, Adam Kijak wrote:
Hello,

We use a Ceph cluster for Nova (Glance and Cinder as well) and over
time,
more and more data is stored there. We can't keep the cluster so big
because of 
Ceph's limitations. Sooner or later it needs to be closed for adding
new 
instances, images and volumes. Not to mention it's a big failure
domain.

I'm really keen to hear more about those limitations.

How do you handle this issue?
What is your strategy to divide Ceph clusters between compute nodes?
How do you solve VM snapshot placement and migration issues then
(snapshots will be left on older Ceph)?

Having played with Ceph and compute on the same hosts, I'm a big fan of
separating them and having dedicated Ceph hosts, and dedicated compute
hosts.  That allows me a lot more flexibility with hardware
configuration and maintenance, easier troubleshooting for resource
contention, and also allows scaling at different rates.

We've been thinking about features like: dynamic Ceph configuration
(not static like in nova.conf) in Nova, pinning instances to a Ceph
cluster etc.
What do you think about that?


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
rs


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 10, 2016 by Xav_Paice (2,520 points)   3 4
0 votes

Have you thought about dedicated pools for cinder/nova and a separate pool for glance, and any other uses you might have?
You need to setup secrets on kvm, but you can have cinder creating volumes from glance images quickly in different pools

On Oct 10, 2016, at 6:29 AM, Adam Kijak adam.kijak@corp.ovh.com wrote:

Hello,

We use a Ceph cluster for Nova (Glance and Cinder as well) and over time,
more and more data is stored there. We can't keep the cluster so big because of
Ceph's limitations. Sooner or later it needs to be closed for adding new
instances, images and volumes. Not to mention it's a big failure domain.

How do you handle this issue?
What is your strategy to divide Ceph clusters between compute nodes?
How do you solve VM snapshot placement and migration issues then
(snapshots will be left on older Ceph)?

We've been thinking about features like: dynamic Ceph configuration
(not static like in nova.conf) in Nova, pinning instances to a Ceph cluster etc.
What do you think about that?


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 10, 2016 by Abel_Lopez (4,820 points)   1 3 5
0 votes


From: Xav Paice xavpaice@gmail.com
Sent: Monday, October 10, 2016 8:41 PM
To: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

On Mon, 2016-10-10 at 13:29 +0000, Adam Kijak wrote:

Hello,

We use a Ceph cluster for Nova (Glance and Cinder as well) and over
time,
more and more data is stored there. We can't keep the cluster so big
because of
Ceph's limitations. Sooner or later it needs to be closed for adding
new
instances, images and volumes. Not to mention it's a big failure
domain.

I'm really keen to hear more about those limitations.

Basically it's all related to the failure domain ("blast radius") and risk management.
Bigger Ceph cluster means more users.
Growing the Ceph cluster temporary slows it down, so many users will be affected.
There are bugs in Ceph which can cause data corruption. It's rare, but when it happens
it can affect many (maybe all) users of the Ceph cluster.

How do you handle this issue?
What is your strategy to divide Ceph clusters between compute nodes?
How do you solve VM snapshot placement and migration issues then
(snapshots will be left on older Ceph)?

Having played with Ceph and compute on the same hosts, I'm a big fan of
separating them and having dedicated Ceph hosts, and dedicated compute
hosts. That allows me a lot more flexibility with hardware
configuration and maintenance, easier troubleshooting for resource
contention, and also allows scaling at different rates.

Exactly, I consider it the best practice as well.


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 12, 2016 by Adam_Kijak (340 points)  
0 votes


From: Abel Lopez alopgeek@gmail.com
Sent: Monday, October 10, 2016 9:57 PM
To: Adam Kijak
Cc: openstack-operators
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

Have you thought about dedicated pools for cinder/nova and a separate pool for glance, and any other uses you might have?
You need to setup secrets on kvm, but you can have cinder creating volumes from glance images quickly in different pools

We already have separate pool for images, volumes and instances.
Separate pools doesn't really split the failure domain though.
Also AFAIK you can't set up multiple pools for instances in nova.conf, right?


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 12, 2016 by Adam_Kijak (340 points)  
0 votes

If fault domain is a concern, you can always split the cloud up into 3
regions, each having a dedicate Ceph cluster. It isn't necessarily going to
mean more hardware, just logical splits. This is kind of assuming that the
network doesn't share the same fault domain though.

Alternatively, you can split the hardware for the Ceph boxes into multiple
clusters, and use multi backend Cinder to talk to the same set of
hypervisors to use multiple Ceph clusters. We're doing that to migrate from
one Ceph cluster to another. You can even mount a volume from each cluster
into a single instance.

Keep in mind that you don't really want to shrink a Ceph cluster too much.
What's "too big"? You should keep growing so that the fault domains aren't
too small (3 physical rack min), or you guarantee that the entire cluster
stops if you lose network.

Just my 2 cents,
Warren

On Wed, Oct 12, 2016 at 8:35 AM, Adam Kijak adam.kijak@corp.ovh.com wrote:


From: Abel Lopez alopgeek@gmail.com
Sent: Monday, October 10, 2016 9:57 PM
To: Adam Kijak
Cc: openstack-operators
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova]
How do you handle Nova on Ceph?

Have you thought about dedicated pools for cinder/nova and a separate
pool for glance, and any other uses you might have?
You need to setup secrets on kvm, but you can have cinder creating
volumes from glance images quickly in different pools

We already have separate pool for images, volumes and instances.
Separate pools doesn't really split the failure domain though.
Also AFAIK you can't set up multiple pools for instances in nova.conf,
right?


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 12, 2016 by Warren_Wang (680 points)   1
0 votes

Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +0000:


From: Xav Paice xavpaice@gmail.com
Sent: Monday, October 10, 2016 8:41 PM
To: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

On Mon, 2016-10-10 at 13:29 +0000, Adam Kijak wrote:

Hello,

We use a Ceph cluster for Nova (Glance and Cinder as well) and over
time,
more and more data is stored there. We can't keep the cluster so big
because of
Ceph's limitations. Sooner or later it needs to be closed for adding
new
instances, images and volumes. Not to mention it's a big failure
domain.

I'm really keen to hear more about those limitations.

Basically it's all related to the failure domain ("blast radius") and risk management.
Bigger Ceph cluster means more users.

Are these risks well documented? Since Ceph is specifically designed
not to have the kind of large blast radius that one might see with
say, a centralized SAN, I'm curious to hear what events trigger
cluster-wide blasts.

Growing the Ceph cluster temporary slows it down, so many users will be affected.

One might say that a Ceph cluster that can't be grown without the users
noticing is an over-subscribed Ceph cluster. My understanding is that
one is always advised to provision a certain amount of cluster capacity
for growing and replicating to replaced drives.

There are bugs in Ceph which can cause data corruption. It's rare, but when it happens
it can affect many (maybe all) users of the Ceph cluster.

:(


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 12, 2016 by Clint_Byrum (40,940 points)   4 5 9
0 votes

From: Warren Wang warren@wangspeed.com
Sent: Wednesday, October 12, 2016 10:02 PM
To: Adam Kijak
Cc: Abel Lopez; openstack-operators
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

If fault domain is a concern, you can always split the cloud up into 3 regions, each having a dedicate Ceph cluster. It isn't necessarily going to mean more hardware, just logical splits. This is kind of assuming that the network doesn't share the same fault domain though.

This is not an option because having Region1-1, Region1-2, ..., Region1-10 would not be very convenient for users.

Alternatively, you can split the hardware for the Ceph boxes into multiple clusters, and use multi backend Cinder to talk to the same set of hypervisors to use multiple Ceph clusters. We're doing that to migrate from one Ceph cluster to another. You can even mount a volume from each cluster into a single instance.

Multiple Ceph clusters on Cinder is not a problem, I agree.
Unfortunately we use Ceph for Nova (disks of instances are on Ceph directly).

Keep in mind that you don't really want to shrink a Ceph cluster too much. What's "too big"? You should keep growing so that the fault domains aren't too small (3 physical rack min), or you guarantee that the entire cluster stops if you lose network.
Just my 2 cents,

Thanks!


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 14, 2016 by Adam_Kijak (340 points)  
0 votes


From: Clint Byrum clint@fewbar.com
Sent: Wednesday, October 12, 2016 10:46 PM
To: openstack-operators
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +0000:


From: Xav Paice xavpaice@gmail.com
Sent: Monday, October 10, 2016 8:41 PM
To: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

I'm really keen to hear more about those limitations.

Basically it's all related to the failure domain ("blast radius") and risk management.
Bigger Ceph cluster means more users.

Are these risks well documented? Since Ceph is specifically designed
not to have the kind of large blast radius that one might see with
say, a centralized SAN, I'm curious to hear what events trigger
cluster-wide blasts.

In theory yes, Ceph is desgined to be fault tolerant,
but from our experience it's not always like that.
I think it's not well documented, but I know this case:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg32804.html

Growing the Ceph cluster temporary slows it down, so many users will be affected.
One might say that a Ceph cluster that can't be grown without the users
noticing is an over-subscribed Ceph cluster. My understanding is that
one is always advised to provision a certain amount of cluster capacity
for growing and replicating to replaced drives.

I agree that provisioning a fixed size Cluster would solve some problems but planning the capacity is not always easy.
Predicting the size and making it cost effective (empty big Ceph cluster costs a lot on the beginning) is quite difficult.
Also adding a new Ceph cluster will be always more transparent to users than manipulating existing one especially when growing pool PGs)


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 14, 2016 by Adam_Kijak (340 points)  
0 votes

Hi Adam,

I agree somewhat, capacity management and growth at scale is something
of a pain. Ceph gives you a hugely powerful and flexible way to manage
data-placement through crush but there is very little quality info
about, or examples of, non-naive crushmap configurations.

I think I understand what you are getting at in regards to
failure-domain, e.g., a large cluster of 1000+ drives may require a
single storage pool (e.g., for nova) across most/all of that storage.
The chances of overlapping drive failures (overlapping meaning before
recovery has completed) in multiple nodes is higher the more drives
there are in the pool unless you design your crushmap to limit the
size of any replica-domain (i.e., the leaf crush bucket that a single
copy of an object may end up in). And in the rbd use case, if you are
unlucky and even lose just a tiny fraction of objects, due to random
placement there is a good chance you have lost a handful of objects
from most/all rbd volumes in the cluster, which could make for many
unhappy users with potentially unrecoverable filesystems in those
rbds.

The guys at UnitedStack did a nice presentation that touched on this a
while back (http://www.slideshare.net/kioecn/build-an-highperformance-and-highdurable-block-storage-service-based-on-ceph)
but I'm not sure I follow their durability model just from these
slides, and if you're going to play with this you really do want a
tool to calculate/simulate the impact the changes.

Interesting discussion - maybe loop in ceph-users?

Cheers,

On 14 October 2016 at 19:53, Adam Kijak adam.kijak@corp.ovh.com wrote:


From: Clint Byrum clint@fewbar.com
Sent: Wednesday, October 12, 2016 10:46 PM
To: openstack-operators
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +0000:


From: Xav Paice xavpaice@gmail.com
Sent: Monday, October 10, 2016 8:41 PM
To: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

I'm really keen to hear more about those limitations.

Basically it's all related to the failure domain ("blast radius") and risk management.
Bigger Ceph cluster means more users.

Are these risks well documented? Since Ceph is specifically designed
not to have the kind of large blast radius that one might see with
say, a centralized SAN, I'm curious to hear what events trigger
cluster-wide blasts.

In theory yes, Ceph is desgined to be fault tolerant,
but from our experience it's not always like that.
I think it's not well documented, but I know this case:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg32804.html

Growing the Ceph cluster temporary slows it down, so many users will be affected.
One might say that a Ceph cluster that can't be grown without the users
noticing is an over-subscribed Ceph cluster. My understanding is that
one is always advised to provision a certain amount of cluster capacity
for growing and replicating to replaced drives.

I agree that provisioning a fixed size Cluster would solve some problems but planning the capacity is not always easy.
Predicting the size and making it cost effective (empty big Ceph cluster costs a lot on the beginning) is quite difficult.
Also adding a new Ceph cluster will be always more transparent to users than manipulating existing one especially when growing pool PGs)


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

--
Cheers,
~Blairo


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Oct 14, 2016 by Blair_Bethwaite (4,080 points)   1 3 5
...