settingsLogin | Registersettings

Re: [Openstack] Disable distributed loadbalancers (LBaaSv2)?

0 votes

When I setup my OS cluster over a year ago, I chose to use
distributed LBaaSv2. That sounded like the most sensible
thing - redundancy is the primary goal with me choosing
OS in the first place!

However, it turned out that there’s a very grave bug in
OS - Neutron - (only just recently fixed - a few weeks ago and
only in the latest development code).

https://bugs.launchpad.net/neutron/+bug/1494003
https://bugs.launchpad.net/neutron/+bug/1493809
https://bugs.launchpad.net/neutron/+bug/1583694

I run Newton (and don’t want to risk everything by either
re-installing or upgrading - last time it took me two months
to get things working again!).

Doesn’t seem to be any backport of the fix to Newton :( :(.

Does anyone have an idea on how I can “hack” the DB
(MySQL) so that it isn’t distributed any more? The OS
command line tools won’t let you de-distribute one :(.

This should be “fairly” straight forward, for anyone that knows
the “inner workings” of Neutron. Simply “undo” whatever

neutron router-create --distributed True --ha False rname

did. I can’t unfortunately delete the router and then recreate
it without destroying my whole setup, instances, networks,
etc, etc. Everything “hangs” off of that router...


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

asked Oct 21, 2017 in openstack by Turbo_Fredriksson (8,980 points)   7 12 14

13 Responses

0 votes

On 09/16/2017 12:25 PM, Turbo Fredriksson wrote:
When I setup my OS cluster over a year ago, I chose to use
distributed LBaaSv2. That sounded like the most sensible
thing - redundancy is the primary goal with me choosing
OS in the first place!

However, it turned out that there’s a very grave bug in
OS - Neutron - (only just recently fixed - a few weeks ago and
only in the latest development code).

https://bugs.launchpad.net/neutron/+bug/1494003
https://bugs.launchpad.net/neutron/+bug/1493809
https://bugs.launchpad.net/neutron/+bug/1583694

I run Newton (and don’t want to risk everything by either
re-installing or upgrading - last time it took me two months
to get things working again!).

Doesn’t seem to be any backport of the fix to Newton :( :(.

Sorry, due to the invasiveness of the changes it won't be backported to
Newton, only Pike will have this support. It also might be slightly
broken until very recent code in stable/pike...

Does anyone have an idea on how I can “hack” the DB
(MySQL) so that it isn’t distributed any more? The OS
command line tools won’t let you de-distribute one :(.

This should be “fairly” straight forward, for anyone that knows
the “inner workings” of Neutron. Simply “undo” whatever

 neutron router-create --distributed True --ha False rname

did. I can’t unfortunately delete the router and then recreate
it without destroying my whole setup, instances, networks,
etc, etc. Everything “hangs” off of that router...

I think you should be able to remove the router interfaces on the
external and internal networks then remove the router, without removing
any of the private networks, etc. Then you can create it again with
--distributed=False. VMs might lose connectivity for a bit though.

Ocata supports DVR -> Centralized router migration, so you would only
have to go forward one release if you choose that path.

-Brian

responded Sep 18, 2017 by haleyb.dev_at_gmail. (880 points)  
0 votes

On 18 Sep 2017, at 14:50, Brian Haley haleyb.dev@gmail.com wrote:

Sorry, due to the invasiveness of the changes it won't be backported to Newton

Bugger! That’s a shame :(. No way I can convince someone to do it,
for a (small) monetary donation?

I think you should be able to remove the router interfaces on the external and internal networks then remove the router

Not sure if I do this correctly. I’m getting an error:

----- snip -----
bladeA01:~# neutron router-port-list tenant
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| id | name | macaddress | fixedips |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| 1647ed58-2c64-486e-a41e-910bf91f0876 | | fa:16:3e:48:e8:b1 | {"subnetid": "67242897-47c9-47f0-a3cd-9d01c3825e07", "ipaddress": "10.0.10.17"} |
| 317c3cf0-3119-4606-9367-fb8d8319d908 | | fa:16:3e:dc:95:1e | {"subnetid": "b3d19d1a-387d-4316-b490-11c8cb98dfd1", "ipaddress": "10.0.9.254"} |
| 6c6f33e9-2a16-44e0-9970-d252ce7d120c | | fa:16:3e:09:2a:21 | {"subnetid": "ab4da704-0ed2-4e54-89e4-afc98b8bb631", "ipaddress": "10.0.6.1"} |
| 8f659e68-252f-4c35-bff8-62211983022a | | fa:16:3e:a6:2c:53 | {"subnetid": "67242897-47c9-47f0-a3cd-9d01c3825e07", "ipaddress": "10.0.10.254"} |
| 9ea245f7-c4a4-42e0-a23e-8109761c20b9 | | fa:16:3e:ad:12:d6 | {"subnetid": "336dc07c-83e7-4a64-a698-15d42b8824b1", "ipaddress": "10.0.8.254"} |
| d0960758-39c1-40ef-9023-84d24d533f93 | | fa:16:3e:8a:34:19 | {"subnetid": "b3d19d1a-387d-4316-b490-11c8cb98dfd1", "ipaddress": "10.0.9.17"} |
| ed1dee2e-f122-45cb-84a2-10f9deafee6a | | fa:16:3e:c5:54:a7 | {"subnetid": "336dc07c-83e7-4a64-a698-15d42b8824b1", "ipaddress": "10.0.8.14"} |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
bladeA01:~# neutron router-interface-delete tenant port=1647ed58-2c64-486e-a41e-910bf91f0876
Router dac1e4f4-dd02-4f97-bc77-952906e8daa7 does not have an interface with id 1647ed58-2c64-486e-a41e-910bf91f0876
Neutron server returns request_ids: ['req-3f3c985f-e4fb-473c-9911-56f8ebff2e58']
----- snip -----

On another one it said I couldn’t do that because it was in use “by one or more
floating ips”. Those I could possibly recreate, if I can just get it to start deleting
interfaces.

I have all my compute nodes shut down at the moment, no point in taking them
up when the LBs don’t work. I rely heavily on LBs for my setup...

Ocata supports DVR -> Centralized router migration, so you would only have to go forward one release if you choose that path.

OS in Debian GNU/Linux is in somewhat of a … “limbo” right now. Not sure
what the status is of Ocata there...


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

responded Sep 18, 2017 by Turbo_Fredriksson (8,980 points)   7 12 14
0 votes

On 18 Sep 2017, at 21:19, Turbo Fredriksson turbo@bayour.com wrote:

No way I can convince someone to do it, for a (small) monetary donation?

No-one?

Any SQL query hack I could use to get rid of it?


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

responded Sep 20, 2017 by Turbo_Fredriksson (8,980 points)   7 12 14
0 votes

On 09/18/2017 04:19 PM, Turbo Fredriksson wrote:
On 18 Sep 2017, at 14:50, Brian Haley haleyb.dev@gmail.com wrote:

Sorry, due to the invasiveness of the changes it won't be backported to Newton

Bugger! That’s a shame :(. No way I can convince someone to do it,
for a (small) monetary donation?

I can only imagine the list of patch dependencies on such a task :(

I think you should be able to remove the router interfaces on the external and internal networks then remove the router

Not sure if I do this correctly. I’m getting an error:

----- snip -----
bladeA01:~# neutron router-port-list tenant
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| id | name | macaddress | fixedips |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| 1647ed58-2c64-486e-a41e-910bf91f0876 | | fa:16:3e:48:e8:b1 | {"subnetid": "67242897-47c9-47f0-a3cd-9d01c3825e07", "ipaddress": "10.0.10.17"} |
| 317c3cf0-3119-4606-9367-fb8d8319d908 | | fa:16:3e:dc:95:1e | {"subnetid": "b3d19d1a-387d-4316-b490-11c8cb98dfd1", "ipaddress": "10.0.9.254"} |
| 6c6f33e9-2a16-44e0-9970-d252ce7d120c | | fa:16:3e:09:2a:21 | {"subnetid": "ab4da704-0ed2-4e54-89e4-afc98b8bb631", "ipaddress": "10.0.6.1"} |
| 8f659e68-252f-4c35-bff8-62211983022a | | fa:16:3e:a6:2c:53 | {"subnetid": "67242897-47c9-47f0-a3cd-9d01c3825e07", "ipaddress": "10.0.10.254"} |
| 9ea245f7-c4a4-42e0-a23e-8109761c20b9 | | fa:16:3e:ad:12:d6 | {"subnetid": "336dc07c-83e7-4a64-a698-15d42b8824b1", "ipaddress": "10.0.8.254"} |
| d0960758-39c1-40ef-9023-84d24d533f93 | | fa:16:3e:8a:34:19 | {"subnetid": "b3d19d1a-387d-4316-b490-11c8cb98dfd1", "ipaddress": "10.0.9.17"} |
| ed1dee2e-f122-45cb-84a2-10f9deafee6a | | fa:16:3e:c5:54:a7 | {"subnetid": "336dc07c-83e7-4a64-a698-15d42b8824b1", "ipaddress": "10.0.8.14"} |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
bladeA01:~# neutron router-interface-delete tenant port=1647ed58-2c64-486e-a41e-910bf91f0876
Router dac1e4f4-dd02-4f97-bc77-952906e8daa7 does not have an interface with id 1647ed58-2c64-486e-a41e-910bf91f0876
Neutron server returns request_ids: ['req-3f3c985f-e4fb-473c-9911-56f8ebff2e58']

Strange error, did you try with the subnet ID instead?

----- snip -----

On another one it said I couldn’t do that because it was in use “by one or more
floating ips”. Those I could possibly recreate, if I can just get it to start deleting
interfaces.

Yes, I guess I expected that.

responded Sep 20, 2017 by haleyb.dev_at_gmail. (880 points)  
0 votes

On 20 Sep 2017, at 16:33, Brian Haley haleyb.dev@gmail.com wrote:

On 09/18/2017 04:19 PM, Turbo Fredriksson wrote:

On 18 Sep 2017, at 14:50, Brian Haley haleyb.dev@gmail.com wrote:

Sorry, due to the invasiveness of the changes it won't be backported to Newton
Bugger! That’s a shame :(. No way I can convince someone to do it,
for a (small) monetary donation?

I can only imagine the list of patch dependencies on such a task :(

No way to do a “patch lite”?

Strange error, did you try with the subnet ID instead?

I’ll try when I get home and/or have more time to look into it.


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

responded Sep 20, 2017 by Turbo_Fredriksson (8,980 points)   7 12 14
0 votes

If you wanted to do it purely with an SQL hack you might be able to set
distributed to False in the routerextraattributes table. Additionally you
would need to delete any entries from the routerports table with the type
'network:routercentralizedsnat' and update any with the type
'network:routerinterfacedistributed' to 'network:router_interface'.

You'll want to do that while all of the L3 and L2 agents are offline. Then
delete all of the router namespaces using the netns cleanup tool and
restart the agents.

That might do the trick, but I haven't tested this any time recently so
unfortunately I can't guarantee that it will be enough. :/

On Wed, Sep 20, 2017 at 6:12 AM, Turbo Fredriksson turbo@bayour.com wrote:

On 18 Sep 2017, at 21:19, Turbo Fredriksson turbo@bayour.com wrote:

No way I can convince someone to do it, for a (small) monetary donation?

No-one?

Any SQL query hack I could use to get rid of it?


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
openstack


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
responded Sep 21, 2017 by kevin_at_benton.pub (15,600 points)   2 3 4
0 votes

On 21 Sep 2017, at 04:45, Kevin Benton kevin@benton.pub wrote:

If you wanted to do it purely with an SQL hack you might be able to set distributed to False in the routerextraattributes table. Additionally you would need to delete any entries from the routerports table with the type 'network:routercentralizedsnat' and update any with the type 'network:routerinterfacedistributed' to 'network:router_interface’.

Thanx! That seems to have done the trick. I think:

bladeA01:~# neutron router-list
+--------------------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+
| id | name | externalgatewayinfo | distributed | ha |
+--------------------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------+
| d9c39638-51b5-481a-a60d-df79b6a06f9d | infrastructure | {"networkid": "b74570c9-f40f-4c64-9e4c-9bf0c9978d2e", "enablesnat": true, "externalfixedips": [{"subnetid": "7bc73f7a-f07e-4ad7-bcd8-1fd77f46c888", "ipaddress": "10.0.5.1"}]} | False | False |
| dac1e4f4-dd02-4f97-bc77-952906e8daa7 | tenant | {"networkid": "b74570c9-f40f-4c64-9e4c-9bf0c9978d2e", "enablesnat": true, "externalfixedips": [{"subnetid": "ab4da704-0ed2-4e54-89e4-afc98b8bb631", "ipaddress": "10.0.6.1"}]} | False | False |
+--------------------------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+———+

The “externalgatewayinfo” still say “enable_snat=true” on both of them…
Is that correct?

But at least both “distributed” and “ha” is “False” now, so there’s a lot
of progress! :). Million thanx.

All my compute nodes are still down, haven’t dared start them up yet.
Any way I can know for sure that this actually worked, without spinning
up everything?


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

responded Sep 25, 2017 by Turbo_Fredriksson (8,980 points)   7 12 14
0 votes

Not really. Just bring some stuff up. Even if the routers are messed up the
compute instances won't get hurt.

The snat stuff sounds right.

On Sep 25, 2017 2:45 PM, "Turbo Fredriksson" turbo@bayour.com wrote:

On 21 Sep 2017, at 04:45, Kevin Benton kevin@benton.pub wrote:

If you wanted to do it purely with an SQL hack you might be able to set
distributed to False in the routerextraattributes table. Additionally you
would need to delete any entries from the routerports table with the type
'network:routercentralizedsnat' and update any with the type
'network:routerinterfacedistributed' to 'network:router_interface’.

Thanx! That seems to have done the trick. I think:

bladeA01:~# neutron router-list
+--------------------------------------+----------------+---



-----------------------------------------------------------+
-------------+-------+
| id | name |
externalgatewayinfo
                          | distributed | ha    |

+--------------------------------------+----------------+---



-----------------------------------------------------------+
-------------+-------+
| d9c39638-51b5-481a-a60d-df79b6a06f9d | infrastructure | {"networkid":
"b74570c9-f40f-4c64-9e4c-9bf0c9978d2e", "enable
snat": true,
"externalfixedips": [{"subnetid": "7bc73f7a-f07e-4ad7-bcd8-1fd77f46c888",
"ip
address": "10.0.5.1"}]} | False | False |
| dac1e4f4-dd02-4f97-bc77-952906e8daa7 | tenant | {"networkid":
"b74570c9-f40f-4c64-9e4c-9bf0c9978d2e", "enable
snat": true,
"externalfixedips": [{"subnetid": "ab4da704-0ed2-4e54-89e4-afc98b8bb631",
"ip
address": "10.0.6.1"}]} | False | False |
+--------------------------------------+----------------+---


-----------------------------------------------------------+
-------------+———+

The “externalgatewayinfo” still say “enable_snat=true” on both of them…
Is that correct?

But at least both “distributed” and “ha” is “False” now, so there’s a lot
of progress! :). Million thanx.

All my compute nodes are still down, haven’t dared start them up yet.
Any way I can know for sure that this actually worked, without spinning
up everything?


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
responded Sep 29, 2017 by kevin_at_benton.pub (15,600 points)   2 3 4
0 votes

On 29 Sep 2017, at 03:20, Kevin Benton kevin@benton.pub wrote:

Not really. Just bring some stuff up. Even if the routers are messed up the compute instances won't get hurt.

The snat stuff sounds right.

Perfect!! Again, million thanx for the help, much appreciated!!


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

responded Sep 29, 2017 by Turbo_Fredriksson (8,980 points)   7 12 14
0 votes

On 29 Sep 2017, at 03:20, Kevin Benton kevin@benton.pub wrote:

Not really. Just bring some stuff up. Even if the routers are messed up the compute instances won't get hurt.

I’ve finally (!!) had time to deal with this, but it still doesn’t seem to be
working. It was quite a while since I set all this up, so it is entirely possible
the problem is elsewhere…

But I can’t access either the VIP or the FIP of selected load balancer.

I have other hosts on the same VIP network as this LB and those I
can connect to (so should exclude any network/router issues).

I have also checked that the security groups allow access - I can access
the host on the same port as the listener on the LB and the SG on the
LBs port also allow relevant queries.

But even if/when I add icmp to the SG, I still can’t ping or trace either
of the IPs..


Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
responded Oct 20, 2017 by Turbo_Fredriksson (8,980 points)   7 12 14
...