settingsLogin | Registersettings

[openstack-dev] [tripleo][manila] Ganesha deployment

0 votes

Hi,
TripleO currently supports deploying CephFS and it can be used as a
backend for Manila service, so user can mount ceph shares by using
either ceph kernel driver or ceph-fuse on his side.

There is an ongoing ganesha-nfs project [1] which can be used as a
proxy when accessing CephFS. The benefit is that user then interacts
only with ganesha server and mounts shares from this server using NFS
protocol.

And Manila will soon support both variants of ceph backend :
1) ceph is used directly (what we support now)
user instance <-- ceph protocol --> ceph cluster

2) ceph is used through ganesha server (what we don't support yet but
we would like)
user instance <-- NFS protocol --> ganesha server <-- ceph protocol --> ceph cluster

We would like to enable both deployment options in TripleO and I
wonder how ganesha deployment should look like.

Prerequisities are:
- use of ganesha servers will be optional - operators can still choose
to use ceph directly, ideally it should be possible to deploy Manila
both with direct ceph and ganesha backends
- ganesha servers should not run on controller nodes (e.g. collocated
with manila-share service) because of data traffic, ideally ganesha
servers should be dedicated (which is probably not a probablem with
composable services)
- ganesha servers will probably use active/passive HA model and will
be managed by pacemaker+corosync - AFAIK detailed HA architecture is
not specified yet and is still in progress by ganesha folks.

I imagine that (extremely simplified) setup would look
something like this from TripleO point of view:
1) define a new role (e.g. NfsStorage) which represents ganesha servers
2) define a new VIP (for IP failover) and 2 networks for NfsStorage role:
a) a frontend network between users and ganesha servers (e.g.
NfsNetwork name), used by tenants to mount nfs shares - this network
should be accessible from user instances.
b) a backend network between ganesha servers ans ceph cluster -
this could just map to the existing StorageNetwork I think.
3) pacemaker and ganesha setup magic happens - I wonder if the
existing puppet pacemaker modules could be used for setting up another
pacemaker cluster on dedicated ganesha nodes? It seems the long term
plan is to switch to ceph-ansible for ceph setup in TripleO. So should
be the whole HA setup of ganesha server delegated to the ceph ansible
isntead?

What i'm not sure at all is how network definition should look like.
There are following Overcloud deployment options:
1) no network isolation is used - then both direct ceph mount and
mount through ganesha should work because StorageNetwork and
NfsNetwork are accessible from user instances (there is no restriction
in accessing other networks it seems).
2) network isolation is used:
a) ceph is used directly - user instances need access to the ceph
public network (which is StorageNetwork in Overcloud) - how should I
enable access to this network? I filled a bug for this deployment
variant here [3]
b) ceph is used through ganesha - user instances need access to
ganesha servers (NfsNetwork in previous paragraph) - how should I
enable access to this network?

The ultimate (and future) plan is to deploy ganesha-nfs in VMs (which
will run in Overcloud, probably managed by manila ceph driver), in
this deployment mode a user should have access to ganesha servers and
only ganesha server VMs should have access to ceph public network.
Ganesha VMs would run in a separate tenant so I wonder if it's
possible to manage access to the ceph public network (StorageNetwork
in Overcloud) on per-tenant level?

Any thoughts and hints?

Thanks, Jan

[1] https://github.com/nfs-ganesha/nfs-ganesha/wiki
[2] https://github.com/ceph/ceph-ansible/tree/master/roles/ceph-nfs
[3] https://bugs.launchpad.net/tripleo/+bug/1680749


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Apr 13, 2017 in openstack-dev by =?ISO-8859-1?Q?Jan_P (1,140 points)   4

5 Responses

0 votes

I'm not really an expert on composable roles so I'll leave that to
someone else, but see my thoughts inline on the networking aspect.

On 04/10/2017 03:22 AM, Jan Provaznik wrote:
2) define a new VIP (for IP failover) and 2 networks for NfsStorage role:
a) a frontend network between users and ganesha servers (e.g.
NfsNetwork name), used by tenants to mount nfs shares - this network
should be accessible from user instances.

Adding a new network is non-trivial today, so I think we want to avoid
that if possible. Is there a reason the Storage network couldn't be
used for this? That is already present on compute nodes by default so
it would be available to user instances, and it seems like the intended
use of the Storage network matches this use case. In a Ceph deployment
today that's the network which exposes data to user instances.

b) a backend network between ganesha servers ans ceph cluster -

this could just map to the existing StorageNetwork I think.

This actually sounds like a better fit for StorageMgmt to me. It's
non-user-facing storage communication, which is what StorageMgmt is used
for in the vanilla Ceph case.

What i'm not sure at all is how network definition should look like.
There are following Overcloud deployment options:
1) no network isolation is used - then both direct ceph mount and
mount through ganesha should work because StorageNetwork and
NfsNetwork are accessible from user instances (there is no restriction
in accessing other networks it seems).

There are no other networks without network-isolation. Everything runs
over the provisioning network. The network-isolation templates should
mostly handle this for you though.

2) network isolation is used:
a) ceph is used directly - user instances need access to the ceph
public network (which is StorageNetwork in Overcloud) - how should I
enable access to this network? I filled a bug for this deployment
variant here [3]

So does this mean that the current manila implementation is completely
broken in network-isolation? If so, that's rather concerning.

If I'm understanding correctly, it sounds like what needs to happen is
to make the Storage network routable so it's available from user
instances. That's not actually something TripleO can do, it's an
underlying infrastructure thing. I'm not sure what the security
implications of it are either.

Well, on second thought it might be possible to make the Storage network
only routable within overcloud Neutron by adding a bridge mapping for
the Storage network and having the admin configure a shared Neutron
network for it. That would be somewhat more secure since it wouldn't
require the Storage network to be routable by the world. I also think
this would work today in TripleO with no changes.

Alternatively I guess you could use ServiceNetMap to move the public
Ceph traffic to the public network, which has to be routable. That
seems like it might have a detrimental effect on the public network's
capacity, but it might be okay in some instances.

b) ceph is used through ganesha - user instances need access to

ganesha servers (NfsNetwork in previous paragraph) - how should I
enable access to this network?

I think the answer here will be the same as for vanilla Ceph. You need
to make the network routable to instances, and you'd have the same
options as I discussed above.

The ultimate (and future) plan is to deploy ganesha-nfs in VMs (which
will run in Overcloud, probably managed by manila ceph driver), in
this deployment mode a user should have access to ganesha servers and
only ganesha server VMs should have access to ceph public network.
Ganesha VMs would run in a separate tenant so I wonder if it's
possible to manage access to the ceph public network (StorageNetwork
in Overcloud) on per-tenant level?

This would suggest that the bridged Storage network approach is the
best. In that case access to the ceph public network is controlled by
the overcloud Neutron, so you would just need to only give access to it
to the tenant running the Ganesha VMs. User VMs would only get access
to a separate shared network providing access to the public Ganesha API,
and the Ganesha VMs would straddle both networks.

Any thoughts and hints?

Thanks, Jan

[1] https://github.com/nfs-ganesha/nfs-ganesha/wiki
[2] https://github.com/ceph/ceph-ansible/tree/master/roles/ceph-nfs
[3] https://bugs.launchpad.net/tripleo/+bug/1680749


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 10, 2017 by Ben_Nemec (19,660 points)   2 3 3
0 votes

On Mon, Apr 10, 2017 at 6:55 PM, Ben Nemec openstack@nemebean.com wrote:
I'm not really an expert on composable roles so I'll leave that to someone
else, but see my thoughts inline on the networking aspect.

On 04/10/2017 03:22 AM, Jan Provaznik wrote:

2) define a new VIP (for IP failover) and 2 networks for NfsStorage role:
a) a frontend network between users and ganesha servers (e.g.
NfsNetwork name), used by tenants to mount nfs shares - this network
should be accessible from user instances.

Adding a new network is non-trivial today, so I think we want to avoid that
if possible. Is there a reason the Storage network couldn't be used for
this? That is already present on compute nodes by default so it would be
available to user instances, and it seems like the intended use of the
Storage network matches this use case. In a Ceph deployment today that's
the network which exposes data to user instances.

Access to the ceph public network (StorageNetwork) is a big privilege
(from discussing this with ceph team), bigger than accessing only
ganesha nfs servers, so StorageNetwork should be exposed only when
really necessary.

b) a backend network between ganesha servers ans ceph cluster -

this could just map to the existing StorageNetwork I think.

This actually sounds like a better fit for StorageMgmt to me. It's
non-user-facing storage communication, which is what StorageMgmt is used for
in the vanilla Ceph case.

If StorageMgmt is used for replication and internal ceph nodes
communication, I wonder if it's not too permissive access? Ganesha
servers should need access ti ceph public network only.

What i'm not sure at all is how network definition should look like.
There are following Overcloud deployment options:
1) no network isolation is used - then both direct ceph mount and
mount through ganesha should work because StorageNetwork and
NfsNetwork are accessible from user instances (there is no restriction
in accessing other networks it seems).

There are no other networks without network-isolation. Everything runs over
the provisioning network. The network-isolation templates should mostly
handle this for you though.

2) network isolation is used:
a) ceph is used directly - user instances need access to the ceph
public network (which is StorageNetwork in Overcloud) - how should I
enable access to this network? I filled a bug for this deployment
variant here [3]

So does this mean that the current manila implementation is completely
broken in network-isolation? If so, that's rather concerning.

This affects deployments of manila with internal (=deployed by
TripleO) ceph backend.

If I'm understanding correctly, it sounds like what needs to happen is to
make the Storage network routable so it's available from user instances.
That's not actually something TripleO can do, it's an underlying
infrastructure thing. I'm not sure what the security implications of it are
either.

Well, on second thought it might be possible to make the Storage network
only routable within overcloud Neutron by adding a bridge mapping for the
Storage network and having the admin configure a shared Neutron network for
it. That would be somewhat more secure since it wouldn't require the
Storage network to be routable by the world. I also think this would work
today in TripleO with no changes.

This sounds interesting, I was searching for more info how bridge
mapping should be done in this case and how specific setup steps
should look like, but the process is still not clear to me, I would be
grateful for more details/guidance with this.

Alternatively I guess you could use ServiceNetMap to move the public Ceph
traffic to the public network, which has to be routable. That seems like it
might have a detrimental effect on the public network's capacity, but it
might be okay in some instances.

I would rather avoid this option (both because of network traffic and
because of exposing ceph public network to everybody).

b) ceph is used through ganesha - user instances need access to

ganesha servers (NfsNetwork in previous paragraph) - how should I
enable access to this network?

I think the answer here will be the same as for vanilla Ceph. You need to
make the network routable to instances, and you'd have the same options as I
discussed above.

Yes, it seems that using the mapping to provider network would solve
the existing problem when using ceph directly and when using ganesha
servers in future (it would be just matter of to which network is
exposed).

The ultimate (and future) plan is to deploy ganesha-nfs in VMs (which
will run in Overcloud, probably managed by manila ceph driver), in
this deployment mode a user should have access to ganesha servers and
only ganesha server VMs should have access to ceph public network.
Ganesha VMs would run in a separate tenant so I wonder if it's
possible to manage access to the ceph public network (StorageNetwork
in Overcloud) on per-tenant level?

This would suggest that the bridged Storage network approach is the best.
In that case access to the ceph public network is controlled by the
overcloud Neutron, so you would just need to only give access to it to the
tenant running the Ganesha VMs. User VMs would only get access to a
separate shared network providing access to the public Ganesha API, and the
Ganesha VMs would straddle both networks.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 11, 2017 by =?ISO-8859-1?Q?Jan_P (1,140 points)   4
0 votes

On Tue, 2017-04-11 at 16:50 +0200, Jan Provaznik wrote:
On Mon, Apr 10, 2017 at 6:55 PM, Ben Nemec openstack@nemebean.com
wrote:

On 04/10/2017 03:22 AM, Jan Provaznik wrote:
Well, on second thought it might be possible to make the Storage
network
only routable within overcloud Neutron by adding a bridge mapping
for the
Storage network and having the admin configure a shared Neutron
network for
it.  That would be somewhat more secure since it wouldn't require
the
Storage network to be routable by the world.  I also think this
would work
today in TripleO with no changes.

This sounds interesting, I was searching for more info how bridge
mapping should be done in this case and how specific setup steps
should look like, but the process is still not clear to me, I would
be
grateful for more details/guidance with this.

I think this will be represented in neutron as a provider network,
which has to be created by the overcloud admin, after the overcloud
deployment is finished

While based on Kilo, this was one of the best docs I could find and it
includes config examples [1]

It assumes that the operator created a bridge mapping for it when
deploying the overcloud

I think the answer here will be the same as for vanilla Ceph.  You
need to
make the network routable to instances, and you'd have the same
options as I
discussed above.

Yes, it seems that using the mapping to provider network would solve
the existing problem when using ceph directly and when using ganesha
servers in future (it would be just matter of to which network is
exposed).

+1

regarding the composability questions, I think this represents a
"composable HA" scenario where we want to manage a remote service with
pacemaker using pacemaker-remote

yet at this stage I think we want to add support for new services by
running them in containers first (only?) and pacemaker+containers is
still a work in progress so there aren't easy answers

containers will have access to the host networks though, so the case
for a provider network in the overcloud remains valid

  1. https://docs.openstack.org/kilo/networking-guide/scenario_provider_o
    vs.html
    --
    Giulio Fidente
    GPG KEY: 08D733BA


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 11, 2017 by Giulio_Fidente (3,980 points)   3 3
0 votes

On 04/11/2017 02:00 PM, Giulio Fidente wrote:
On Tue, 2017-04-11 at 16:50 +0200, Jan Provaznik wrote:

On Mon, Apr 10, 2017 at 6:55 PM, Ben Nemec openstack@nemebean.com
wrote:

On 04/10/2017 03:22 AM, Jan Provaznik wrote:
Well, on second thought it might be possible to make the Storage
network
only routable within overcloud Neutron by adding a bridge mapping
for the
Storage network and having the admin configure a shared Neutron
network for
it. That would be somewhat more secure since it wouldn't require
the
Storage network to be routable by the world. I also think this
would work
today in TripleO with no changes.

This sounds interesting, I was searching for more info how bridge
mapping should be done in this case and how specific setup steps
should look like, but the process is still not clear to me, I would
be
grateful for more details/guidance with this.

I think this will be represented in neutron as a provider network,
which has to be created by the overcloud admin, after the overcloud
deployment is finished

While based on Kilo, this was one of the best docs I could find and it
includes config examples [1]

It assumes that the operator created a bridge mapping for it when
deploying the overcloud

I think the answer here will be the same as for vanilla Ceph. You
need to
make the network routable to instances, and you'd have the same
options as I
discussed above.

Yes, it seems that using the mapping to provider network would solve
the existing problem when using ceph directly and when using ganesha
servers in future (it would be just matter of to which network is
exposed).

+1

regarding the composability questions, I think this represents a
"composable HA" scenario where we want to manage a remote service with
pacemaker using pacemaker-remote

yet at this stage I think we want to add support for new services by
running them in containers first (only?) and pacemaker+containers is
still a work in progress so there aren't easy answers

containers will have access to the host networks though, so the case
for a provider network in the overcloud remains valid

  1. https://docs.openstack.org/kilo/networking-guide/scenario_provider_o
    vs.html

I think there are three major pieces that would need to be in place to
have a storage provider network:

1) The storage network must be bridged in the net-iso templates. I
don't think our default net-iso templates do that, but there are
examples of bridged networks in them:
https://github.com/openstack/tripleo-heat-templates/blob/master/network/config/multiple-nics/compute.yaml#L121
For the rest of the steps I will assume the bridge was named br-storage.

2) Specify a bridge mapping when deploying the overcloud. The
environment file would look something like this (datacentre is the
default value, so I'm including it too):

parameter_defaults:
NeutronBridgeMappings: 'datacentre:br-ex,storage:br-storage'

3) Create a provider network after deployment as described in the link
Giulio provided. The specific command will depend on the network
architecture, but it would need to include "--provider:physical_network
storage".

We might need to add the ability to do 3 as part of the deployment,
depending what is needed for the Ganesha deployment itself. We've
typically avoided creating network resources like this in the deployment
because of the huge variations in what people want, but this might be an
exceptional case since the network will be a required part of the overcloud.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 11, 2017 by Ben_Nemec (19,660 points)   2 3 3
0 votes

On Tue, Apr 11, 2017 at 9:45 PM, Ben Nemec openstack@nemebean.com wrote:

On 04/11/2017 02:00 PM, Giulio Fidente wrote:

On Tue, 2017-04-11 at 16:50 +0200, Jan Provaznik wrote:

On Mon, Apr 10, 2017 at 6:55 PM, Ben Nemec openstack@nemebean.com
wrote:

On 04/10/2017 03:22 AM, Jan Provaznik wrote:
Well, on second thought it might be possible to make the Storage
network
only routable within overcloud Neutron by adding a bridge mapping
for the
Storage network and having the admin configure a shared Neutron
network for
it. That would be somewhat more secure since it wouldn't require
the
Storage network to be routable by the world. I also think this
would work
today in TripleO with no changes.

This sounds interesting, I was searching for more info how bridge
mapping should be done in this case and how specific setup steps
should look like, but the process is still not clear to me, I would
be
grateful for more details/guidance with this.

I think this will be represented in neutron as a provider network,
which has to be created by the overcloud admin, after the overcloud
deployment is finished

While based on Kilo, this was one of the best docs I could find and it
includes config examples [1]

It assumes that the operator created a bridge mapping for it when
deploying the overcloud

I think the answer here will be the same as for vanilla Ceph. You
need to
make the network routable to instances, and you'd have the same
options as I
discussed above.

Yes, it seems that using the mapping to provider network would solve
the existing problem when using ceph directly and when using ganesha
servers in future (it would be just matter of to which network is
exposed).

+1

regarding the composability questions, I think this represents a
"composable HA" scenario where we want to manage a remote service with
pacemaker using pacemaker-remote

yet at this stage I think we want to add support for new services by
running them in containers first (only?) and pacemaker+containers is
still a work in progress so there aren't easy answers

containers will have access to the host networks though, so the case
for a provider network in the overcloud remains valid

  1. https://docs.openstack.org/kilo/networking-guide/scenario_provider_o
    vs.html

I think there are three major pieces that would need to be in place to have
a storage provider network:

1) The storage network must be bridged in the net-iso templates. I don't
think our default net-iso templates do that, but there are examples of
bridged networks in them:
https://github.com/openstack/tripleo-heat-templates/blob/master/network/config/multiple-nics/compute.yaml#L121
For the rest of the steps I will assume the bridge was named br-storage.

2) Specify a bridge mapping when deploying the overcloud. The environment
file would look something like this (datacentre is the default value, so I'm
including it too):

parameter_defaults:
NeutronBridgeMappings: 'datacentre:br-ex,storage:br-storage'

3) Create a provider network after deployment as described in the link
Giulio provided. The specific command will depend on the network
architecture, but it would need to include "--provider:physical_network
storage".

We might need to add the ability to do 3 as part of the deployment,
depending what is needed for the Ganesha deployment itself. We've typically
avoided creating network resources like this in the deployment because of
the huge variations in what people want, but this might be an exceptional
case since the network will be a required part of the overcloud.

Thank you both for your help, based on steps suggested above I was
able to mount ceph volume in user instance when Overcloud was deployed
with net-iso with net-single-nic-with-vlans (which is the easiest one
I can deploy in my virtualenv). For net-single-nic-with-vlans I
skipped creation of additional bridge (since single bridge is used for
all networks in this case) and deployed OC as usually, then I
configured networking:
neutron net-create storage --shared --provider:physicalnetwork
datacentre --provider:network
type vlan --provider:segmentation_id 30
neutron subnet-create --name storage-subnet --allocation-pool
start=172.16.1.100,end=172.16.1.120 --enable-dhcp storage
172.16.1.0/24

and created user instance with tenant and storage network:
| f7d4e619-c8f5-4de3-a4c3-4120eea818d1 | Server1 | ACTIVE | -
| Running | default-net=192.168.2.107, 192.168.24.100;
storage=172.16.1.110 |

The obstacle I'm hitting though is that the second interface with
storage network doesn't come up automatically on instance boot:
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether fa:16:3e:71:90:df brd ff:ff:ff:ff:ff:ff

from cloud-init log:
ci-info: +++++++++++++++++++++++++++++++Net device
info++++++++++++++++++++++++++++++++
ci-info: +--------+-------+---------------+---------------+-------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope |
Hw-Address |
ci-info: +--------+-------+---------------+---------------+-------+-------------------+
ci-info: | eth1: | False | . | . | . |
fa:16:3e:27:2a:bf |
ci-info: | eth0: | True | 192.168.2.107 | 255.255.255.0 | . |
fa:16:3e:ba:00:49 |
ci-info: | eth0: | True | . | . | d |
fa:16:3e:ba:00:49 |
ci-info: | lo: | True | 127.0.0.1 | 255.0.0.0 | . |
. |
ci-info: | lo: | True | . | . | d |
. |
ci-info: +--------+-------+---------------+---------------+-------+-------------------+

If I manually set IP for eth1, then ceph mount works. We discussed
this with Giulio and he suspects the problem is that DHCP conflicts
with DHCP server running on undercloud for storage network. Any ideas?

Thanks, Jan


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Apr 13, 2017 by =?ISO-8859-1?Q?Jan_P (1,140 points)   4
...