settingsLogin | Registersettings

[Openstack-operators] cinder/nova issues

0 votes

Greetings,

I am having an issue with nova starting an instance that is using a root
volume that cinder has extended. More specifically, a volume that has been
extended past the max resize limit of our Netapp filer. I am running
Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage
of this functionality. From what I can gather, it uses sub-lun cloning to
get past the hard limit set by Netapp when cloning past 64G (starting from
a 4G volume).

Environment:

  • Release: Liberty
  • Filer: Netapp
  • Protocol: Fiberchannel
  • Multipath: yes

*Steps to reproduce: *

  • Create new instance
  • stop instance
  • extend the volume by running the following commands:

    • cinder reset-state --state available (volume-ID or name)
    • cinder extend (volume-ID or name) 100
    • cinder reset-state --state in-use (volume-ID or name)
  • start instance with either nova start or nova reboot --hard --same
    result

I can see that the instance's multipath status is good before the resize...

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=active

| |- 6:0:1:5 sdy 65:128 active undef running

| `- 7:0:0:5 sdz 65:144 active undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 active undef running

`- 7:0:1:5 sdaa 65:160 active undef running

Once the volume is resized, the lun goes to a failed state and it does not
show the new size:

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=enabled

| |- 6:0:1:5 sdy 65:128 failed undef running

| `- 7:0:0:5 sdz 65:144 failed undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 failed undef running

`- 7:0:1:5 sdaa 65:160 failed undef running

Like I said, this only happens on volumes that have been extended past 64G.
Smaller sizes to not have this issue. I can only assume that the original
lun is getting destroyed after the clone process and that is cause of the
failed state. Why is it not picking up the new one and attaching it to the
compute node? Is there something I am missing?

Thanks in advance,

Adam


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
asked Aug 24, 2017 in openstack-operators by Adam_Dibiase (260 points)   1

3 Responses

0 votes

Hey Adam,

There have been some updates since Liberty to improve handling in the os-brick
library that handles the local device management. But with this showing the
paths down, I wonder if there's something else going on there between the
NetApp box and the Nova compute host.

Could you file a bug to track this? I think you could just copy and paste the
content of your original email since it captures a lot of great info.

https://bugs.launchpad.net/cinder/+filebug

We can tag it with netapp so maybe it will get some attention there.

Thanks,
Sean

On Wed, Aug 23, 2017 at 01:01:24PM -0400, Adam Dibiase wrote:
Greetings,

I am having an issue with nova starting an instance that is using a root
volume that cinder has extended. More specifically, a volume that has been
extended past the max resize limit of our Netapp filer. I am running
Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage
of this functionality. From what I can gather, it uses sub-lun cloning to
get past the hard limit set by Netapp when cloning past 64G (starting from
a 4G volume).

Environment:

  • Release: Liberty
  • Filer: Netapp
  • Protocol: Fiberchannel
  • Multipath: yes

*Steps to reproduce: *

  • Create new instance
  • stop instance
  • extend the volume by running the following commands:

    • cinder reset-state --state available (volume-ID or name)
    • cinder extend (volume-ID or name) 100
    • cinder reset-state --state in-use (volume-ID or name)
  • start instance with either nova start or nova reboot --hard --same
    result

I can see that the instance's multipath status is good before the resize...

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=active

| |- 6:0:1:5 sdy 65:128 active undef running

| `- 7:0:0:5 sdz 65:144 active undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 active undef running

`- 7:0:1:5 sdaa 65:160 active undef running

Once the volume is resized, the lun goes to a failed state and it does not
show the new size:

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=enabled

| |- 6:0:1:5 sdy 65:128 failed undef running

| `- 7:0:0:5 sdz 65:144 failed undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 failed undef running

`- 7:0:1:5 sdaa 65:160 failed undef running

Like I said, this only happens on volumes that have been extended past 64G.
Smaller sizes to not have this issue. I can only assume that the original
lun is getting destroyed after the clone process and that is cause of the
failed state. Why is it not picking up the new one and attaching it to the
compute node? Is there something I am missing?

Thanks in advance,

Adam


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Aug 23, 2017 by Sean_McGinnis (11,820 points)   2 2 5
0 votes

Thanks Sean. I filed a bug report to track this. Bug #1712651. I would
agree with you on connectivity issues with the Netapp if it happened on all
volume extensions, but this only happens in one scenario only.

Thanks,

Adam

On Wed, Aug 23, 2017 at 2:04 PM, Sean McGinnis sean.mcginnis@gmx.com
wrote:

Hey Adam,

There have been some updates since Liberty to improve handling in the
os-brick
library that handles the local device management. But with this showing the
paths down, I wonder if there's something else going on there between the
NetApp box and the Nova compute host.

Could you file a bug to track this? I think you could just copy and paste
the
content of your original email since it captures a lot of great info.

https://bugs.launchpad.net/cinder/+filebug

We can tag it with netapp so maybe it will get some attention there.

Thanks,
Sean

On Wed, Aug 23, 2017 at 01:01:24PM -0400, Adam Dibiase wrote:

Greetings,

I am having an issue with nova starting an instance that is using a root
volume that cinder has extended. More specifically, a volume that has
been
extended past the max resize limit of our Netapp filer. I am running
Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take
advantage
of this functionality. From what I can gather, it uses sub-lun cloning to
get past the hard limit set by Netapp when cloning past 64G (starting
from
a 4G volume).

Environment:

  • Release: Liberty
  • Filer: Netapp
  • Protocol: Fiberchannel
  • Multipath: yes

*Steps to reproduce: *

  • Create new instance
  • stop instance
  • extend the volume by running the following commands:

    • cinder reset-state --state available (volume-ID or name)
    • cinder extend (volume-ID or name) 100
    • cinder reset-state --state in-use (volume-ID or name)
  • start instance with either nova start or nova reboot --hard --same
    result

I can see that the instance's multipath status is good before the
resize...

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=active

| |- 6:0:1:5 sdy 65:128 active undef running

| `- 7:0:0:5 sdz 65:144 active undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 active undef running

`- 7:0:1:5 sdaa 65:160 active undef running

Once the volume is resized, the lun goes to a failed state and it does
not
show the new size:

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=enabled

| |- 6:0:1:5 sdy 65:128 failed undef running

| `- 7:0:0:5 sdz 65:144 failed undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 failed undef running

`- 7:0:1:5 sdaa 65:160 failed undef running

Like I said, this only happens on volumes that have been extended past
64G.
Smaller sizes to not have this issue. I can only assume that the original
lun is getting destroyed after the clone process and that is cause of the
failed state. Why is it not picking up the new one and attaching it to
the
compute node? Is there something I am missing?

Thanks in advance,

Adam


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Aug 23, 2017 by Adam_Dibiase (260 points)   1
0 votes

Hi!

If this was OpenStack Kilo and HPE 3PAR over Fibre Channel, I would tell you that the volume extend operation is designed to work with detached volumes only. Hence you need cinder reset-state. At least in our case, it does not update the SCSI devices and multipath setup. The volume continues to work with the old size. We do a live migrate operation afterwards to disconnect the storage from one node and connect to another. Even resize to the same node works. However, os-brick was introduced in Liberty, so the case may be different.

Tomas

From: Adam Dibiase [mailto:adibiase@digiumcloud.com]
Sent: Wednesday, August 23, 2017 9:06 PM
To: Sean McGinnis
Cc: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] cinder/nova issues

Thanks Sean. I filed a bug report to track this. Bug #1712651. I would agree with you on connectivity issues with the Netapp if it happened on all volume extensions, but this only happens in one scenario only.

Thanks,

Adam

On Wed, Aug 23, 2017 at 2:04 PM, Sean McGinnis sean.mcginnis@gmx.com wrote:

Hey Adam,

There have been some updates since Liberty to improve handling in the os-brick
library that handles the local device management. But with this showing the
paths down, I wonder if there's something else going on there between the
NetApp box and the Nova compute host.

Could you file a bug to track this? I think you could just copy and paste the
content of your original email since it captures a lot of great info.

https://bugs.launchpad.net/cinder/+filebug

We can tag it with netapp so maybe it will get some attention there.

Thanks,
Sean

On Wed, Aug 23, 2017 at 01:01:24PM -0400, Adam Dibiase wrote:
Greetings,

I am having an issue with nova starting an instance that is using a root
volume that cinder has extended. More specifically, a volume that has been
extended past the max resize limit of our Netapp filer. I am running
Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage
of this functionality. From what I can gather, it uses sub-lun cloning to
get past the hard limit set by Netapp when cloning past 64G (starting from
a 4G volume).

Environment:

  • Release: Liberty
  • Filer: Netapp
  • Protocol: Fiberchannel
  • Multipath: yes

*Steps to reproduce: *

  • Create new instance
  • stop instance
  • extend the volume by running the following commands:

    • cinder reset-state --state available (volume-ID or name)
    • cinder extend (volume-ID or name) 100
    • cinder reset-state --state in-use (volume-ID or name)
  • start instance with either nova start or nova reboot --hard --same
    result

I can see that the instance's multipath status is good before the resize...

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=active

| |- 6:0:1:5 sdy 65:128 active undef running

| `- 7:0:0:5 sdz 65:144 active undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 active undef running

`- 7:0:1:5 sdaa 65:160 active undef running

Once the volume is resized, the lun goes to a failed state and it does not
show the new size:

*360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN *

size=20G features='1 queueifno_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=enabled

| |- 6:0:1:5 sdy 65:128 failed undef running

| `- 7:0:0:5 sdz 65:144 failed undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

|- 6:0:0:5 sdx 65:112 failed undef running

`- 7:0:1:5 sdaa 65:160 failed undef running

Like I said, this only happens on volumes that have been extended past 64G.
Smaller sizes to not have this issue. I can only assume that the original
lun is getting destroyed after the clone process and that is cause of the
failed state. Why is it not picking up the new one and attaching it to the
compute node? Is there something I am missing?

Thanks in advance,

Adam


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Aug 24, 2017 by vondra_at_homeatclou (780 points)  
...