settingsLogin | Registersettings

[Openstack-operators] Neutron HA With VRRP (2 Master Nodes, Bogus VRRP packet)

0 votes

Hello folks,

I'm trying to use HA with VRRP using keepalived under Juno release. This is a
VLAN setup using ML2 plugin and openvswitch.

I've setup second neutron node and added ha_* configuration parameters. When I
create a new router, I see that it's created on both network nodes. However, it
seems that keepalived cannot select master nodes as both nodes are seen as
"master".

I debugged the problem a bit and came to conclusion that VRRP messages are
correctly sent (with tcpdump). The first node announces that it has the
floating and internal tenant network ip (router ip). Second node also announces
its virtual ip written in keepalived.conf (169.254.0.1). But in the second
node, I see the following message:

May 26 10:54:15 neutron-ha-2 Keepalivedvrrp[20144]: ip address associated with
VRID not present in received packet : 169.254.0.1
May 26 10:54:15 neutron-ha-2 Keepalived
vrrp[20144]: one or more VIP associated
with VRID mismatch actual MASTER advert
May 26 10:54:15 neutron-ha-2 Keepalivedvrrp[20144]: bogus VRRP packet received
on ha-38ff51c5-f0 !!!
May 26 10:54:15 neutron-ha-2 Keepalived
vrrp[20144]: VRRPInstance(VR1)
Dropping received VRRP packet...

Indeed, 169.254.0.1 which is present in the second node as virtual ip, is not
announced (or present) in the VRRP message sent from the first node. The
message from the first node only contains router ip address for the tenant
network.

It seems that there is a known bug for neutron with HA setup if l2population
is enabled. However, I do not have l2population enabled (it's explicitly
disabled on plugin.ini): https://bugs.launchpad.net/neutron/+bug/1365476

There is a patch [0] for this but it's against master. Accordingly to LP bug
report, it isn't going to be backported to Juno release. Is this LP bug and
patch related to this issue?

I am attaching keepalived.conf files from both of the nodes. The current state
is that those two node states are "master" and I see the "bogus vrrp packet"
error in the second node. There is no backup node and the cluster isn't
functioning.

Has anyone successfully installed HA setup with VRRP? I gladly appreciate any
help regarding to this issue.

Note: keepalived v1.2.16 (05/25,2015) is used in both nodes.

[0] https://review.openstack.org/#/c/141114/

---- BEGIN: keepalived.conf on NODE-1 ----
vrrpsyncgroup VG1 {
group {
VR
1
}
notifymaster
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifymaster.sh"
notify
backup
"/var/lib/neutron/haconfs/93b94981-01ac-4e59-8718-5103b867f3dd/notifybackup.sh"
notifyfault
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifyfault.sh"
}
vrrp
instance VR1 {
state BACKUP
interface ha-b9db8385-67
virtual
routerid 1
priority 50
nopreempt
advert
int 2
trackinterface {
ha-b9db8385-67
}
virtual
ipaddress {
10.30.0.1/24 dev qr-a867fbab-dd
}
virtualipaddressexcluded {
192.168.92.23/32 dev qg-690ab8d9-81
192.168.92.33/16 dev qg-690ab8d9-81
}
virtual_routes {
0.0.0.0/0 via 192.168.88.1 dev qg-690ab8d9-81
}

---- END: keepalived.conf on NODE-1 ----

---- BEGIN: keepalived.conf on NODE-2 ----
vrrpsyncgroup VG1 {
group {
VR
1
}
notifymaster
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifymaster.sh"
notify
backup
"/var/lib/neutron/haconfs/93b94981-01ac-4e59-8718-5103b867f3dd/notifybackup.sh"
notifyfault
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifyfault.sh"
}
vrrp
instance VR1 {
state BACKUP
interface ha-38ff51c5-f0
virtual
routerid 1
priority 50
nopreempt
advert
int 2
trackinterface {
ha-38ff51c5-f0
}
virtual
ipaddress {
169.254.0.1/24 dev ha-38ff51c5-f0
}
virtualipaddressexcluded {
10.30.0.1/24 dev qr-a867fbab-dd
192.168.92.23/32 dev qg-690ab8d9-81
192.168.92.33/16 dev qg-690ab8d9-81
fe80::f816:33ff:fe45:26ed/64 dev qr-a867fbab-dd scope link
fe80::f816:33ff:fee8:fd4b/64 dev qg-690ab8d9-81 scope link
}
virtual_routes {
0.0.0.0/0 via 192.168.88.1 dev qg-690ab8d9-81
}

---- END: keepalived.conf on NODE-2 ----

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

asked May 26, 2015 in openstack-operators by Eren_Türkay (1,220 points)   1 4 8

6 Responses

0 votes

On 26-05-2015 11:15, Eren Türkay wrote:
May 26 10:54:15 neutron-ha-2 Keepalivedvrrp[20144]: ip address associated with
VRID not present in received packet : 169.254.0.1
May 26 10:54:15 neutron-ha-2 Keepalived
vrrp[20144]: one or more VIP associated
with VRID mismatch actual MASTER advert
May 26 10:54:15 neutron-ha-2 Keepalivedvrrp[20144]: bogus VRRP packet received
on ha-38ff51c5-f0 !!!
May 26 10:54:15 neutron-ha-2 Keepalived
vrrp[20144]: VRRPInstance(VR1)
Dropping received VRRP packet...

I guess I made some progress. I manually built keepalived.conf. With the
settings below, keepalived is working without a problem. I added both IP
addresses (internal HA address and tenant router ip) to vrrp_addresses section.
Master/backup switch is OK, when link goes down, failover happens within a
reasonable time (20 to 30 seconds). Now, the problem is why keepalived.conf is
deficient on second node and lacking an IP address which causes a problem?

---- BEGIN: keepalived.conf on NODE-1 ----
vrrpsyncgroup VG1 {
group {
VR
1
}
notifymaster
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifymaster.sh"
notify
backup
"/var/lib/neutron/haconfs/93b94981-01ac-4e59-8718-5103b867f3dd/notifybackup.sh"
notifyfault
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifyfault.sh"
}
vrrp
instance VR1 {
state BACKUP
interface ha-b9db8385-67
virtual
routerid 1
priority 50
nopreempt
advert
int 2
trackinterface {
ha-b9db8385-67
}
virtual
ipaddress {
10.30.0.1/24 dev qr-a867fbab-dd
169.254.0.1/24 dev ha-b9db8385-67
}
virtualipaddressexcluded {
192.168.92.23/32 dev qg-690ab8d9-81
192.168.92.33/16 dev qg-690ab8d9-81
}
virtual_routes {
0.0.0.0/0 via 192.168.88.1 dev qg-690ab8d9-81
}
}

---- END: keepalived.conf on NODE-1 ----

---- BEGIN: keepalived.conf on NODE-2 ----
vrrpsyncgroup VG1 {
group {
VR
1
}
notifymaster
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifymaster.sh"
notify
backup
"/var/lib/neutron/haconfs/93b94981-01ac-4e59-8718-5103b867f3dd/notifybackup.sh"
notifyfault
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifyfault.sh"
}
vrrp
instance VR1 {
state BACKUP
interface ha-38ff51c5-f0
virtual
routerid 1
priority 50
nopreempt
advert
int 2
trackinterface {
ha-38ff51c5-f0
}
virtual
ipaddress {
10.30.0.1/24 dev qr-a867fbab-dd
169.254.0.1/24 dev ha-38ff51c5-f0
}
virtualipaddressexcluded {
192.168.92.23/32 dev qg-690ab8d9-81
192.168.92.33/16 dev qg-690ab8d9-81
fe80::f816:33ff:fe45:26ed/64 dev qr-a867fbab-dd scope link
fe80::f816:33ff:fee8:fd4b/64 dev qg-690ab8d9-81 scope link
}
virtual_routes {
0.0.0.0/0 via 192.168.88.1 dev qg-690ab8d9-81
}
}

---- END: keepalived.conf on NODE-2 ----

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

responded May 26, 2015 by Eren_Türkay (1,220 points)   1 4 8
0 votes

Are you running the same version of the code on both nodes?

----- Original Message -----
On 26-05-2015 11:15, Eren Türkay wrote:

May 26 10:54:15 neutron-ha-2 Keepalivedvrrp[20144]: ip address associated
with
VRID not present in received packet : 169.254.0.1
May 26 10:54:15 neutron-ha-2 Keepalived
vrrp[20144]: one or more VIP
associated
with VRID mismatch actual MASTER advert
May 26 10:54:15 neutron-ha-2 Keepalivedvrrp[20144]: bogus VRRP packet
received
on ha-38ff51c5-f0 !!!
May 26 10:54:15 neutron-ha-2 Keepalived
vrrp[20144]: VRRPInstance(VR1)
Dropping received VRRP packet...

I guess I made some progress. I manually built keepalived.conf. With the
settings below, keepalived is working without a problem. I added both IP
addresses (internal HA address and tenant router ip) to vrrp_addresses
section.
Master/backup switch is OK, when link goes down, failover happens within a
reasonable time (20 to 30 seconds). Now, the problem is why keepalived.conf
is
deficient on second node and lacking an IP address which causes a problem?

---- BEGIN: keepalived.conf on NODE-1 ----
vrrpsyncgroup VG1 {
group {
VR
1
}
notifymaster
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifymaster.sh"
notify
backup
"/var/lib/neutron/haconfs/93b94981-01ac-4e59-8718-5103b867f3dd/notifybackup.sh"
notifyfault
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifyfault.sh"
}
vrrp
instance VR1 {
state BACKUP
interface ha-b9db8385-67
virtual
routerid 1
priority 50
nopreempt
advert
int 2
trackinterface {
ha-b9db8385-67
}
virtual
ipaddress {
10.30.0.1/24 dev qr-a867fbab-dd
169.254.0.1/24 dev ha-b9db8385-67
}
virtualipaddressexcluded {
192.168.92.23/32 dev qg-690ab8d9-81
192.168.92.33/16 dev qg-690ab8d9-81
}
virtual_routes {
0.0.0.0/0 via 192.168.88.1 dev qg-690ab8d9-81
}
}

---- END: keepalived.conf on NODE-1 ----

---- BEGIN: keepalived.conf on NODE-2 ----
vrrpsyncgroup VG1 {
group {
VR
1
}
notifymaster
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifymaster.sh"
notify
backup
"/var/lib/neutron/haconfs/93b94981-01ac-4e59-8718-5103b867f3dd/notifybackup.sh"
notifyfault
"/var/lib/neutron/ha
confs/93b94981-01ac-4e59-8718-5103b867f3dd/notifyfault.sh"
}
vrrp
instance VR1 {
state BACKUP
interface ha-38ff51c5-f0
virtual
routerid 1
priority 50
nopreempt
advert
int 2
trackinterface {
ha-38ff51c5-f0
}
virtual
ipaddress {
10.30.0.1/24 dev qr-a867fbab-dd
169.254.0.1/24 dev ha-38ff51c5-f0
}
virtualipaddressexcluded {
192.168.92.23/32 dev qg-690ab8d9-81
192.168.92.33/16 dev qg-690ab8d9-81
fe80::f816:33ff:fe45:26ed/64 dev qr-a867fbab-dd scope link
fe80::f816:33ff:fee8:fd4b/64 dev qg-690ab8d9-81 scope link
}
virtual_routes {
0.0.0.0/0 via 192.168.88.1 dev qg-690ab8d9-81
}
}

---- END: keepalived.conf on NODE-2 ----

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded May 26, 2015 by Assaf_Muller (5,540 points)   1 2 6
0 votes

On 26-05-2015 16:46, Assaf Muller wrote:
Are you running the same version of the code on both nodes?

For keepalived, yes. For neutron, no. First node is deployed using Mirantis
Fuel. Apparently, neutron --version reports 2.3.9. The second node is deployed
using cloudarchive repository (on ubuntu 14.04) and neutron --version reports
2.3.8. I'm not sure about the patches that mirantis applies.

Does this minor difference affect the outcome? Should I try installing the
second node using mirantis repositories or completely isolate neutron nodes and
deploy them using ubuntu cloudarchive repository?

Regards,
Eren

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

responded May 26, 2015 by Eren_Türkay (1,220 points)   1 4 8
0 votes

Comments in-line.

----- Original Message -----
On 26-05-2015 16:46, Assaf Muller wrote:

Are you running the same version of the code on both nodes?

For keepalived, yes. For neutron, no. First node is deployed using Mirantis
Fuel. Apparently, neutron --version reports 2.3.9. The second node is
deployed
using cloudarchive repository (on ubuntu 14.04) and neutron --version reports
2.3.8. I'm not sure about the patches that mirantis applies.

OK looking at the generated keepalived.conf I suspected the nodes were using
different Neutron versions. Yes, this could have an impact.

Does this minor difference affect the outcome?
Yes. There were changed in this area between those two versions.

Should I try installing the
second node using mirantis repositories or completely isolate neutron nodes
and
deploy them using ubuntu cloudarchive repository?

I'd install them via the same tool, make sure the outcome is the same Neutron
code on both nodes with the same patches applied, recreate the routers entirely
and see what happens. If it works, work from there.

Regards,
Eren

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded May 26, 2015 by Assaf_Muller (5,540 points)   1 2 6
0 votes

On 26-05-2015 17:12, Assaf Muller wrote:
I'd install them via the same tool, make sure the outcome is the same Neutron
code on both nodes with the same patches applied, recreate the routers entirely
and see what happens. If it works, work from there.

Thank you for your response! I will try with the exact same code on both
neutron servers. I wasn't aware that minor version could affect this behavior.
I will probably go for ubuntu 14.04 with cloudarchive repositories for neutron
servers (version 2.3.8).

Regards,

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

responded May 26, 2015 by Eren_Türkay (1,220 points)   1 4 8
0 votes

On 26-05-2015 17:12, Assaf Muller wrote:
I'd install them via the same tool, make sure the outcome is the same Neutron
code on both nodes with the same patches applied, recreate the routers entirely
and see what happens. If it works, work from there.

Hello,

I confirm that HA is working when the same code is running on all the neutron
nodes. Given HA configuration parameters in neutron.conf, newly created
routers/networks are working highly-available. However, I have non-HA
routers/networks in the production/test environment and I failed to find a way
to convert those routers into HA.

Is there any way to convert non-ha routers to HA with floating IPs, same
network address for running VMs, etc? I tried adding new router and adding an
interface to existing network (non-ha network). VM's were able to ping this
router. I disassociated floating IP and re-associate them but those IPs are
created in non-ha router. I, then, tried to remove the existing router to see
if floating IPs will be created on the only available router (HA) but I failed
to remove the previous router.

So far I've run out of ideas to make this network migration for running VM
except to create a new router/network, re-create a VM from volume, attach this
new VM to new network, disassociate floating ip, and associate it to new port.
However, this is the last option for me as it will be hard in terms of
operation perspective. Yet, all VMs will experience serious downtime and
internal IP addresses will change, causing the VM operator to change the
configuration of as well. Any ideas?

Regards,
Eren

--
Eren Türkay, System Administrator
https://skyatlas.com/ | +90 850 885 0357

Yildiz Teknik Universitesi Davutpasa Kampusu
Teknopark Bolgesi, D2 Blok No:107
Esenler, Istanbul Pk.34220


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

responded May 28, 2015 by Eren_Türkay (1,220 points)   1 4 8
...