settingsLogin | Registersettings

[openstack-dev] [nova][neutron] Passthrough of PF's from SR-IOV capable devices.

0 votes

Hi all,

In the current SR-IOV implementation there is a check in Nova (specifically getdevice_type in nova/virt/libvirt/driver.py) that determines if a given PCI device is:

  • a normal PCI device,
  • an SR-IOV physical function (PF); or
  • an SR-IOV virtual function (VF).

If it's a normal PCI device or a virtual function it's considered fair game for passthrough, if it's a PF it's not (considered to be owned by the host). There are two things I am a little unclear on and was hoping someone might be able to help me understand:

1) If the device is a "normal" PCI device, but is a network card, am I still able to take advantage of the advanced syntax added circa Juno to define the relationship between that card and a given physical network so that the scheduler can place accordingly (and does this still use the ML2 mech drvier for SR-IOV even though it's a "normal" device.

2) There is no functional reason from a Libvirt/Qemu perspective that I couldn't pass through a PF to a guest, and some users have expressed surprise to me when they have run into this check in the Nova driver. I assume in the initial implementation this was prevented to avoid a whole heap of fun additional logic that is required if this is allowed (e.g. check that no VFs from the PF being requested are already in use, remove all the associated VFs from the pool when assigning the PF, who gets allowed to use PFs versus VFs etc.). Am I correct here or is there another reason that this would be undesirable to allow in future - assuming such checks can also be designed - that I am missing?

Thanks,

Steve


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Feb 5, 2015 in openstack-dev by Steve_Gordon (9,680 points)   2 6 7
retagged Apr 14, 2015 by admin

9 Responses

0 votes

Hi

1) If the device is a "normal" PCI device, but is a network card, am I still able to
take advantage of the advanced syntax added circa Juno to define the
relationship between that card and a given physical network so that the
scheduler can place accordingly (and does this still use the ML2 mech drvier for
SR-IOV even though it's a "normal" device.

Actually libvirt won't allow using "normal" PCI devices for network interfaces into VM.
Following error is thrown by libvirt 1.2.9.1:
libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only

I don't know why libvirt prohibits that. But we should prohibit that on Openstack side as well.

2) There is no functional reason from a Libvirt/Qemu perspective that I couldn't
pass through a PF to a guest, and some users have expressed surprise to me
when they have run into this check in the Nova driver. I assume in the initial
implementation this was prevented to avoid a whole heap of fun additional logic
that is required if this is allowed (e.g. check that no VFs from the PF being
requested are already in use, remove all the associated VFs from the pool when
assigning the PF, who gets allowed to use PFs versus VFs etc.). Am I correct here
or is there another reason that this would be undesirable to allow in future -
assuming such checks can also be designed - that I am missing?

I think that is correct. But even if the additional logic was implemented it wouldn't work because of how libvirt behaves currently.

Regards
Przemek

Thanks,

Steve



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 5, 2015 by Czesnowicz,_Przemysl (660 points)   1 2
0 votes

----- Original Message -----
From: "Przemyslaw Czesnowicz" przemyslaw.czesnowicz@intel.com
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org

Hi

1) If the device is a "normal" PCI device, but is a network card, am I
still able to
take advantage of the advanced syntax added circa Juno to define the
relationship between that card and a given physical network so that the
scheduler can place accordingly (and does this still use the ML2 mech
drvier for
SR-IOV even though it's a "normal" device.

Actually libvirt won't allow using "normal" PCI devices for network
interfaces into VM.
Following error is thrown by libvirt 1.2.9.1:
libvirtError: unsupported configuration: Interface type hostdev is currently
supported on SR-IOV Virtual Functions only

I don't know why libvirt prohibits that. But we should prohibit that on
Openstack side as well.

This is true for hostdev"> style configuration, "normal" PCI devices are still valid in Libvirt for passthrough using though. The former having been specifically created for handling passthrough of VFs, the latter being the more generic passthrough functionality and what was used with the original PCI passthrough functionality introduced circa Havana.

I guess what I'm really asking in this particular question is what is the intersection of these two implementations - if any, as on face value it seems that to passthrough a physical PCI device I must use the older syntax and thus can't have the scheduler be aware of its external network connectivity.

2) There is no functional reason from a Libvirt/Qemu perspective that I
couldn't
pass through a PF to a guest, and some users have expressed surprise to me
when they have run into this check in the Nova driver. I assume in the
initial
implementation this was prevented to avoid a whole heap of fun additional
logic
that is required if this is allowed (e.g. check that no VFs from the PF
being
requested are already in use, remove all the associated VFs from the pool
when
assigning the PF, who gets allowed to use PFs versus VFs etc.). Am I
correct here
or is there another reason that this would be undesirable to allow in
future -
assuming such checks can also be designed - that I am missing?

I think that is correct. But even if the additional logic was implemented it
wouldn't work because of how libvirt behaves currently.

Again though, in the code we have a distinction between a physical device (as I was asking about in Q1) and a physical function (as I am asking about in Q2) and similarly whether libvirt allows or not depends on how you configure in the guest XML. Though I wouldn't be surprised on the PF case if it is in fact not allowed in Libvirt (even with ) it is again important to consider this distinctly separate from passing through the physical device case which we DO allow currently in the code I'm asking about.

-Steve


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 5, 2015 by Steve_Gordon (9,680 points)   2 6 7
0 votes

On Thu, Feb 5, 2015 at 9:01 PM, Steve Gordon sgordon@redhat.com wrote:

----- Original Message -----

From: "Przemyslaw Czesnowicz" przemyslaw.czesnowicz@intel.com
To: "OpenStack Development Mailing List (not for usage questions)" <
openstack-dev@lists.openstack.org>

Hi

1) If the device is a "normal" PCI device, but is a network card, am I
still able to
take advantage of the advanced syntax added circa Juno to define the
relationship between that card and a given physical network so that the
scheduler can place accordingly (and does this still use the ML2 mech
drvier for
SR-IOV even though it's a "normal" device.

Actually libvirt won't allow using "normal" PCI devices for network
interfaces into VM.
Following error is thrown by libvirt 1.2.9.1:
libvirtError: unsupported configuration: Interface type hostdev is
currently
supported on SR-IOV Virtual Functions only

I don't know why libvirt prohibits that. But we should prohibit that on
Openstack side as well.

This is true for hostdev"> style configuration, "normal" PCI devices are
still valid in Libvirt for passthrough using though. The former
having been specifically created for handling passthrough of VFs, the
latter being the more generic passthrough functionality and what was used
with the original PCI passthrough functionality introduced circa Havana.

I guess what I'm really asking in this particular question is what is the
intersection of these two implementations - if any, as on face value it
seems that to passthrough a physical PCI device I must use the older syntax
and thus can't have the scheduler be aware of its external network
connectivity.

Support for "normal" PCI device passthrough for networking in SR-IOV like
way will require new VIF Driver support for hostdev style device guest XML
being created and some call invocation to set MAC address and VLAN tag.

2) There is no functional reason from a Libvirt/Qemu perspective that I
couldn't
pass through a PF to a guest, and some users have expressed surprise
to me
when they have run into this check in the Nova driver. I assume in the
initial
implementation this was prevented to avoid a whole heap of fun
additional
logic
that is required if this is allowed (e.g. check that no VFs from the PF
being
requested are already in use, remove all the associated VFs from the
pool
when
assigning the PF, who gets allowed to use PFs versus VFs etc.). Am I
correct here
or is there another reason that this would be undesirable to allow in
future -
assuming such checks can also be designed - that I am missing?

I think that is correct. But even if the additional logic was
implemented it
wouldn't work because of how libvirt behaves currently.

Again though, in the code we have a distinction between a physical device
(as I was asking about in Q1) and a physical function (as I am asking about
in Q2) and similarly whether libvirt allows or not depends on how you
configure in the guest XML. Though I wouldn't be surprised on the PF case
if it is in fact not allowed in Libvirt (even with ) it is again
important to consider this distinctly separate from passing through the
physical device case which we DO allow currently in the code I'm asking
about.

I think what you suggest is not difficult to support, but current (since
Juno) PCI device passthrough for networking is all about SR-IOV PCI device
passthrough. As I mentioned, to support "normal" PCI device will require
libvirt VIF Driver adjustment. I think its possible to make this work with
existing neutron ML2 SRIOV Mechanism Driver.

-Steve


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 8, 2015 by irenab.dev_at_gmail. (1,480 points)   2
0 votes

There seemed to be two ways to create a VM via cli:
1) use neutron command to create a port first and then use nova command to attach the vm to that port(neutron port-create.. followed by nova boot --nic port-id=)2)Just use nova command and a port will implicitly be created for you(nova boot --nic net-id=net-uuid).
My question is : is #2 sufficient enough to cover all the scenarios? In other words, if we are not allowed to use #1(can only use #2 to create vm), would we miss anything?
Regards!Wanjing Xu __________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Feb 10, 2015 by Wanjing_Xu (620 points)   1
0 votes

Hi

When you create a port separately, you can specify additional fixed IPs,
extra DHCP options. But with 'nova boot' you cannot.
Also if you need an instance with several nics, and you want that each nic
has its own set of security groups, you should create ports separately.
Because 'nova boot --security-groups ggg' command sets specified security
groups for an each port, which is created during the instance launch.

On Tue, Feb 10, 2015 at 9:21 AM, Wanjing Xu wanjing_xu@hotmail.com wrote:

There seemed to be two ways to create a VM via cli:

1) use neutron command to create a port first and then use nova command to
attach the vm to that port(neutron port-create.. followed by nova boot
--nic port-id=)
2)Just use nova command and a port will implicitly be created for you(nova
boot --nic net-id=net-uuid).

My question is : is #2 sufficient enough to cover all the scenarios? In
other words, if we are not allowed to use #1(can only use #2 to create vm),
would we miss anything?

Regards!
Wanjing Xu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 10, 2015 by Feodor_Tersin (820 points)   1
0 votes

There seemed to be two ways to create a VM via cli:

1) use neutron command to create a port first and then use nova command to attach the vm to that port(neutron port-create.. >followed by nova boot --nic port-id=)
2)Just use nova command and a port will implicitly be created for you(nova boot --nic net-id=net-uuid).

My question is : is #2 sufficient enough to cover all the scenarios?  In other words, if we are not allowed to use #1(can only use >#2 to create vm), would we miss anything?

You wouldn't be able to set the vnictype of the port.
Setting vnic
type=direct is required to boot vm's with SRIOV ports.

Regards
Przemek


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 10, 2015 by Czesnowicz,_Przemysl (660 points)   1 2
0 votes

As pointed out by the examples in the other replies, you would essentially
have to support every possible parameter to "neutron port-create" in "nova
boot". That's creating unnecessary knowledge of neutron in nova. If you had
to eliminate one of the two, the second workflow should actually be the one
to go because that would support a better separation of concerns.

On Mon, Feb 9, 2015 at 10:21 PM, Wanjing Xu wanjing_xu@hotmail.com wrote:

There seemed to be two ways to create a VM via cli:

1) use neutron command to create a port first and then use nova command to
attach the vm to that port(neutron port-create.. followed by nova boot
--nic port-id=)
2)Just use nova command and a port will implicitly be created for you(nova
boot --nic net-id=net-uuid).

My question is : is #2 sufficient enough to cover all the scenarios? In
other words, if we are not allowed to use #1(can only use #2 to create vm),
would we miss anything?

Regards!
Wanjing Xu


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Kevin Benton


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 10, 2015 by Kevin_Benton (24,800 points)   3 5 7
0 votes

----- Original Message -----
From: "Irena Berezovsky" irenab.dev@gmail.com
To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.org,

On Thu, Feb 5, 2015 at 9:01 PM, Steve Gordon sgordon@redhat.com wrote:

----- Original Message -----

From: "Przemyslaw Czesnowicz" przemyslaw.czesnowicz@intel.com
To: "OpenStack Development Mailing List (not for usage questions)" <
openstack-dev@lists.openstack.org>

Hi

1) If the device is a "normal" PCI device, but is a network card, am I
still able to
take advantage of the advanced syntax added circa Juno to define the
relationship between that card and a given physical network so that the
scheduler can place accordingly (and does this still use the ML2 mech
drvier for
SR-IOV even though it's a "normal" device.

Actually libvirt won't allow using "normal" PCI devices for network
interfaces into VM.
Following error is thrown by libvirt 1.2.9.1:
libvirtError: unsupported configuration: Interface type hostdev is
currently
supported on SR-IOV Virtual Functions only

I don't know why libvirt prohibits that. But we should prohibit that on
Openstack side as well.

This is true for hostdev"> style configuration, "normal" PCI devices are
still valid in Libvirt for passthrough using though. The former
having been specifically created for handling passthrough of VFs, the
latter being the more generic passthrough functionality and what was used
with the original PCI passthrough functionality introduced circa Havana.

I guess what I'm really asking in this particular question is what is the
intersection of these two implementations - if any, as on face value it
seems that to passthrough a physical PCI device I must use the older syntax
and thus can't have the scheduler be aware of its external network
connectivity.

Support for "normal" PCI device passthrough for networking in SR-IOV like
way will require new VIF Driver support for hostdev style device guest XML
being created and some call invocation to set MAC address and VLAN tag.

2) There is no functional reason from a Libvirt/Qemu perspective that I
couldn't
pass through a PF to a guest, and some users have expressed surprise
to me
when they have run into this check in the Nova driver. I assume in the
initial
implementation this was prevented to avoid a whole heap of fun
additional
logic
that is required if this is allowed (e.g. check that no VFs from the PF
being
requested are already in use, remove all the associated VFs from the
pool
when
assigning the PF, who gets allowed to use PFs versus VFs etc.). Am I
correct here
or is there another reason that this would be undesirable to allow in
future -
assuming such checks can also be designed - that I am missing?

I think that is correct. But even if the additional logic was
implemented it
wouldn't work because of how libvirt behaves currently.

Again though, in the code we have a distinction between a physical device
(as I was asking about in Q1) and a physical function (as I am asking about
in Q2) and similarly whether libvirt allows or not depends on how you
configure in the guest XML. Though I wouldn't be surprised on the PF case
if it is in fact not allowed in Libvirt (even with ) it is again
important to consider this distinctly separate from passing through the
physical device case which we DO allow currently in the code I'm asking
about.

I think what you suggest is not difficult to support, but current (since
Juno) PCI device passthrough for networking is all about SR-IOV PCI device
passthrough. As I mentioned, to support "normal" PCI device will require
libvirt VIF Driver adjustment. I think its possible to make this work with
existing neutron ML2 SRIOV Mechanism Driver.

Understood, was just trying to understand if there was an explicit reason not to do this. How should we track this, keep adding to https://etherpad.openstack.org/p/kilo_sriov_pci_passthrough ?

Thanks,

Steve


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 19, 2015 by Steve_Gordon (9,680 points)   2 6 7
0 votes

Please see inline

On Thu, Feb 19, 2015 at 4:43 PM, Steve Gordon sgordon@redhat.com wrote:

----- Original Message -----

From: "Irena Berezovsky" irenab.dev@gmail.com
To: "OpenStack Development Mailing List (not for usage questions)" <
openstack-dev@lists.openstack.org>,

On Thu, Feb 5, 2015 at 9:01 PM, Steve Gordon sgordon@redhat.com wrote:

----- Original Message -----

From: "Przemyslaw Czesnowicz" przemyslaw.czesnowicz@intel.com
To: "OpenStack Development Mailing List (not for usage questions)" <
openstack-dev@lists.openstack.org>

Hi

1) If the device is a "normal" PCI device, but is a network card,
am I
still able to
take advantage of the advanced syntax added circa Juno to define
the
relationship between that card and a given physical network so
that the
scheduler can place accordingly (and does this still use the ML2
mech
drvier for
SR-IOV even though it's a "normal" device.

Actually libvirt won't allow using "normal" PCI devices for network
interfaces into VM.
Following error is thrown by libvirt 1.2.9.1:
libvirtError: unsupported configuration: Interface type hostdev is
currently
supported on SR-IOV Virtual Functions only

I don't know why libvirt prohibits that. But we should prohibit that
on
Openstack side as well.

This is true for hostdev"> style configuration, "normal" PCI devices
are
still valid in Libvirt for passthrough using though. The
former
having been specifically created for handling passthrough of VFs, the
latter being the more generic passthrough functionality and what was
used
with the original PCI passthrough functionality introduced circa
Havana.

I guess what I'm really asking in this particular question is what is
the
intersection of these two implementations - if any, as on face value it
seems that to passthrough a physical PCI device I must use the older
syntax
and thus can't have the scheduler be aware of its external network
connectivity.

Support for "normal" PCI device passthrough for networking in SR-IOV like
way will require new VIF Driver support for hostdev style device guest
XML
being created and some call invocation to set MAC address and VLAN tag.

2) There is no functional reason from a Libvirt/Qemu perspective
that I
couldn't
pass through a PF to a guest, and some users have expressed
surprise
to me
when they have run into this check in the Nova driver. I assume in
the
initial
implementation this was prevented to avoid a whole heap of fun
additional
logic
that is required if this is allowed (e.g. check that no VFs from
the PF
being
requested are already in use, remove all the associated VFs from
the
pool
when
assigning the PF, who gets allowed to use PFs versus VFs etc.). Am
I
correct here
or is there another reason that this would be undesirable to allow
in
future -
assuming such checks can also be designed - that I am missing?

I think that is correct. But even if the additional logic was
implemented it
wouldn't work because of how libvirt behaves currently.

Again though, in the code we have a distinction between a physical
device
(as I was asking about in Q1) and a physical function (as I am asking
about
in Q2) and similarly whether libvirt allows or not depends on how you
configure in the guest XML. Though I wouldn't be surprised on the PF
case
if it is in fact not allowed in Libvirt (even with ) it is
again
important to consider this distinctly separate from passing through the
physical device case which we DO allow currently in the code I'm asking
about.

I think what you suggest is not difficult to support, but current (since
Juno) PCI device passthrough for networking is all about SR-IOV PCI
device
passthrough. As I mentioned, to support "normal" PCI device will require
libvirt VIF Driver adjustment. I think its possible to make this work
with
existing neutron ML2 SRIOV Mechanism Driver.

Understood, was just trying to understand if there was an explicit reason
not to do this. How should we track this, keep adding to
https://etherpad.openstack.org/p/kilo_sriov_pci_passthrough ?

I think that probably new etherpad for Liberty should be created in order
to track SR-IOV and PCI features. Most of the features proposed for Kilo
were rejected due to the nova and neutron priorities focus on other areas.
All listed and rejected features and new features priorities should be
evaluated and probably picked by people willing to drive it. For Kilo we
started this work during the pci_passthrough weekly meetings and finalized
at the summit. I think it worked well. I would suggest to do the same for
Liberty.

BR,
Irena

Thanks,

Steve


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Feb 22, 2015 by irenab.dev_at_gmail. (1,480 points)   2
...