settingsLogin | Registersettings

[openstack-dev] vGPUs support for Nova

0 votes

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already made a
decision about moving in that direction anyway. My personal feeling is
that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction of
Resource Provider, it will need at least 2 more releases to expose the
vGPUs feature and that without the support of NUMA, and with the
feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of something
stable we could start to migrate our current virt specific features
like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is production
ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the /pci
framework currently understand only SRIOV but on a quick glance it
does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
    'address' and add new field 'uuid'
  • Update PciRequest to handle a new tag like 'vgpu_types'
  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and the
method 'consume_request' is going to select the right vGPUs according
the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include GPUs
    devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt driver
part, I'm sure Jianghua Wang will be happy to take the lead for the
virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.

s.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Sep 27, 2017 in openstack-dev by Sahid_Orentino_Ferdj (1,020 points)   1

6 Responses

0 votes

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:
There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already made a
decision about moving in that direction anyway. My personal feeling is
that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction of
Resource Provider, it will need at least 2 more releases to expose the
vGPUs feature and that without the support of NUMA, and with the
feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of something
stable we could start to migrate our current virt specific features
like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is production
ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the /pci
framework currently understand only SRIOV but on a quick glance it
does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
    'address' and add new field 'uuid'
  • Update PciRequest to handle a new tag like 'vgpu_types'
  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and the
method 'consume_request' is going to select the right vGPUs according
the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include GPUs
    devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt driver
part, I'm sure Jianghua Wang will be happy to take the lead for the
virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.

I understand the desire to get something in to Nova to support vGPUs,
and I understand that the existing /pci modules represent the
fastest/cheapest way to get there.

I won't block you from making any of the above changes, Sahid. I'll even
do my best to review them. However, I will be primarily focusing this
cycle on getting the nested resource providers work feature-complete for
(at least) SR-IOV PF/VF devices.

The decision of whether to allow an approach that adds more to the
existing /pci module is ultimately Matt's.

Best,
-jay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 25, 2017 by Jay_Pipes (59,760 points)   3 10 14
0 votes

On 9/25/2017 5:40 AM, Jay Pipes wrote:
On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already made a
decision about moving in that direction anyway. My personal feeling is
that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction of
Resource Provider, it will need at least 2 more releases to expose the
vGPUs feature and that without the support of NUMA, and with the
feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of something
stable we could start to migrate our current virt specific features
like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is production
ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the /pci
framework currently understand only SRIOV but on a quick glance it
does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
       'address' and add new field 'uuid'
  • Update PciRequest to handle a new tag like 'vgpu_types'
  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and the
method 'consume_request' is going to select the right vGPUs according
the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include GPUs
       devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt driver
part, I'm sure Jianghua Wang will be happy to take the lead for the
virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.

I understand the desire to get something in to Nova to support vGPUs,
and I understand that the existing /pci modules represent the
fastest/cheapest way to get there.

I won't block you from making any of the above changes, Sahid. I'll even
do my best to review them. However, I will be primarily focusing this
cycle on getting the nested resource providers work feature-complete for
(at least) SR-IOV PF/VF devices.

The decision of whether to allow an approach that adds more to the
existing /pci module is ultimately Matt's.

Best,
-jay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Nested resource providers is not merged or production ready because we
haven't made it a priority. We've certainly talked about it and Jay has
had patches proposed for several releases now though.

Building vGPU support into the existing framework, which only a couple
of people understand - certainly not me, might be a short-term gain but
is just more technical debt we have to pay off later, and delays any
focus on nested resource providers for the wider team.

At the Queens PTG it was abundantly clear that many features are
dependent on nested resource providers, including several
networking-related features like bandwidth-based scheduling.

The priorities for placement/scheduler in Queens are:

  1. Dan Smith's migration allocations cleanup.
  2. Alternative hosts for reschedules with cells v2.
  3. Nested resource providers.

All of these are in progress and need review.

I personally don't think we should abandon the plan to implement vGPU
support with nested resource providers without first seeing any code
changes for it as a proof of concept. It also sounds like we have a
pretty simple staggered plan for rolling out vGPU support so it's not
very detailed to start. The virt driver reports vGPU inventory and we
decorate the details later with traits (which Alex Xu is working on and
needs review).

Sahid, you could certainly implement a separate proof of concept and
make that available if the nested resource providers-based change hits
major issues or goes far too long and has too much risk, then we have a
contingency plan at least. But I don't expect that to get review
priority and you'd have to accept that it might not get merged since we
want to use nested resource providers.

Either way we are going to need solid functional testing and that
functional testing should be written against the API as much as possible
so that it works regardless of the backend implementation of the
feature. One of the big things we failed at in Pike was not doing enough
functional testing of move operations with claims in the scheduler
earlier in the cycle. That all came in late and we're still fixing bugs
as a result.

If we can get started early on the functional testing for vGPUs, then
work both implementations in parallel, we should be able to retain the
functional tests and determine which implementation we ultimately need
to go with probably sometime in the second milestone.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 25, 2017 by mriedemos_at_gmail.c (15,720 points)   2 4 7
0 votes

On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
On 9/25/2017 5:40 AM, Jay Pipes wrote:

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already made a
decision about moving in that direction anyway. My personal feeling is
that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction of
Resource Provider, it will need at least 2 more releases to expose the
vGPUs feature and that without the support of NUMA, and with the
feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of something
stable we could start to migrate our current virt specific features
like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is production
ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the /pci
framework currently understand only SRIOV but on a quick glance it
does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
       'address' and add new field 'uuid'
  • Update PciRequest to handle a new tag like 'vgpu_types'
  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and the
method 'consume_request' is going to select the right vGPUs according
the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include GPUs
       devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt driver
part, I'm sure Jianghua Wang will be happy to take the lead for the
virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.

I understand the desire to get something in to Nova to support vGPUs,
and I understand that the existing /pci modules represent the
fastest/cheapest way to get there.

I won't block you from making any of the above changes, Sahid. I'll even
do my best to review them. However, I will be primarily focusing this
cycle on getting the nested resource providers work feature-complete for
(at least) SR-IOV PF/VF devices.

The decision of whether to allow an approach that adds more to the
existing /pci module is ultimately Matt's.

Best,
-jay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Nested resource providers is not merged or production ready because we
haven't made it a priority. We've certainly talked about it and Jay has had
patches proposed for several releases now though.

Building vGPU support into the existing framework, which only a couple of
people understand - certainly not me, might be a short-term gain but is just
more technical debt we have to pay off later, and delays any focus on nested
resource providers for the wider team.

At the Queens PTG it was abundantly clear that many features are dependent
on nested resource providers, including several networking-related features
like bandwidth-based scheduling.

The priorities for placement/scheduler in Queens are:

  1. Dan Smith's migration allocations cleanup.
  2. Alternative hosts for reschedules with cells v2.
  3. Nested resource providers.

All of these are in progress and need review.

I personally don't think we should abandon the plan to implement vGPU
support with nested resource providers without first seeing any code changes
for it as a proof of concept. It also sounds like we have a pretty simple
staggered plan for rolling out vGPU support so it's not very detailed to
start. The virt driver reports vGPU inventory and we decorate the details
later with traits (which Alex Xu is working on and needs review).

Sahid, you could certainly implement a separate proof of concept and make
that available if the nested resource providers-based change hits major
issues or goes far too long and has too much risk, then we have a
contingency plan at least. But I don't expect that to get review priority
and you'd have to accept that it might not get merged since we want to use
nested resource providers.

That seems to be fair, I understand your desire to make the
implementation on Resource Provider a priority and I'm with you. In
general my preference is to do not stop progress on virt features
because we have a new "product" on-going.

Either way we are going to need solid functional testing and that functional
testing should be written against the API as much as possible so that it
works regardless of the backend implementation of the feature. One of the
big things we failed at in Pike was not doing enough functional testing of
move operations with claims in the scheduler earlier in the cycle. That all
came in late and we're still fixing bugs as a result.

It's very true and most of the time we are asking our users to be
beta-testers, that is one more reason why my preference is for a real
deprecation phase.

If we can get started early on the functional testing for vGPUs, then work
both implementations in parallel, we should be able to retain the functional
tests and determine which implementation we ultimately need to go with
probably sometime in the second milestone.

--

Thanks,

Matt


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 25, 2017 by Sahid_Orentino_Ferdj (1,020 points)   1
0 votes

Sahid,

Just share some background. XenServer doesn't expose vGPUs as mdev or pci devices. I proposed a spec about one year ago to make fake pci devices so that we can use the existing PCI mechanism to cover vGPUs. But that's not a good design and got strongly objection. After that, we switched to use the resource providers by following the advice from the core team.

Regards,
Jianghua

-----Original Message-----
From: Sahid Orentino Ferdjaoui [mailto:sferdjao@redhat.com]
Sent: Monday, September 25, 2017 11:01 PM
To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] vGPUs support for Nova

On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:
On 9/25/2017 5:40 AM, Jay Pipes wrote:

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already
made a decision about moving in that direction anyway. My personal
feeling is that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction
of Resource Provider, it will need at least 2 more releases to
expose the vGPUs feature and that without the support of NUMA, and
with the feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of
something stable we could start to migrate our current virt
specific features like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is
production ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the
/pci framework currently understand only SRIOV but on a quick
glance it does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
       'address' and add new field 'uuid'
  • Update PciRequest to handle a new tag like 'vgpu_types'
  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and
the method 'consume_request' is going to select the right vGPUs
according the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include GPUs
       devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt
driver part, I'm sure Jianghua Wang will be happy to take the lead
for the virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.

I understand the desire to get something in to Nova to support
vGPUs, and I understand that the existing /pci modules represent the
fastest/cheapest way to get there.

I won't block you from making any of the above changes, Sahid. I'll
even do my best to review them. However, I will be primarily
focusing this cycle on getting the nested resource providers work
feature-complete for (at least) SR-IOV PF/VF devices.

The decision of whether to allow an approach that adds more to the
existing /pci module is ultimately Matt's.

Best,
-jay


______ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Nested resource providers is not merged or production ready because we
haven't made it a priority. We've certainly talked about it and Jay
has had patches proposed for several releases now though.

Building vGPU support into the existing framework, which only a couple
of people understand - certainly not me, might be a short-term gain
but is just more technical debt we have to pay off later, and delays
any focus on nested resource providers for the wider team.

At the Queens PTG it was abundantly clear that many features are
dependent on nested resource providers, including several
networking-related features like bandwidth-based scheduling.

The priorities for placement/scheduler in Queens are:

  1. Dan Smith's migration allocations cleanup.
  2. Alternative hosts for reschedules with cells v2.
  3. Nested resource providers.

All of these are in progress and need review.

I personally don't think we should abandon the plan to implement vGPU
support with nested resource providers without first seeing any code
changes for it as a proof of concept. It also sounds like we have a
pretty simple staggered plan for rolling out vGPU support so it's not
very detailed to start. The virt driver reports vGPU inventory and we
decorate the details later with traits (which Alex Xu is working on and needs review).

Sahid, you could certainly implement a separate proof of concept and
make that available if the nested resource providers-based change hits
major issues or goes far too long and has too much risk, then we have
a contingency plan at least. But I don't expect that to get review
priority and you'd have to accept that it might not get merged since
we want to use nested resource providers.

That seems to be fair, I understand your desire to make the implementation on Resource Provider a priority and I'm with you. In general my preference is to do not stop progress on virt features because we have a new "product" on-going.

Either way we are going to need solid functional testing and that
functional testing should be written against the API as much as
possible so that it works regardless of the backend implementation of
the feature. One of the big things we failed at in Pike was not doing
enough functional testing of move operations with claims in the
scheduler earlier in the cycle. That all came in late and we're still fixing bugs as a result.

It's very true and most of the time we are asking our users to be beta-testers, that is one more reason why my preference is for a real deprecation phase.

If we can get started early on the functional testing for vGPUs, then
work both implementations in parallel, we should be able to retain the
functional tests and determine which implementation we ultimately need
to go with probably sometime in the second milestone.

--

Thanks,

Matt


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 25, 2017 by Jianghua_Wang (600 points)   1
0 votes

On Mon, Sep 25, 2017 at 04:59:04PM +0000, Jianghua Wang wrote:
Sahid,

Just share some background. XenServer doesn't expose vGPUs as mdev
or pci devices.

That does not make any sense. There is physical device (PCI) which
provides functions (vGPUs). These functions are exposed through mdev
framework. What you need is the mdev UUID related to a specific vGPU
and I'm sure that XenServer is going to expose it. Something which
XenServer may not expose is the NUMA node where the physical device is
plugged on but in such situation you could still use sysfs.

I proposed a spec about one year ago to make fake pci devices so
that we can use the existing PCI mechanism to cover vGPUs. But
that's not a good design and got strongly objection. After that, we
switched to use the resource providers by following the advice from
the core team.

Regards,
Jianghua

-----Original Message-----
From: Sahid Orentino Ferdjaoui [mailto:sferdjao@redhat.com]
Sent: Monday, September 25, 2017 11:01 PM
To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] vGPUs support for Nova

On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:

On 9/25/2017 5:40 AM, Jay Pipes wrote:

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of Resource
Provider which is probably the path we should be going in the long
term. I was not there for the last PTG and you probably already
made a decision about moving in that direction anyway. My personal
feeling is that it is premature.

The nested Resource Provider work is not yet feature-complete and
requires more reviewer attention. If we continue in the direction
of Resource Provider, it will need at least 2 more releases to
expose the vGPUs feature and that without the support of NUMA, and
with the feeling of pushing something which is not stable/production-ready.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of
something stable we could start to migrate our current virt
specific features like NUMA, CPU Pinning, Huge Pages and finally PCI devices.

I'm talking about PCI devices in general because I think we should
implement the vGPU on top of our /pci framework which is
production ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the
/pci framework currently understand only SRIOV but on a quick
glance it does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
       'address' and add new field 'uuid'
  • Update PciRequest to handle a new tag like 'vgpu_types'
  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and
the method 'consume_request' is going to select the right vGPUs
according the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include GPUs
       devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt
driver part, I'm sure Jianghua Wang will be happy to take the lead
for the virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the developments.

I understand the desire to get something in to Nova to support
vGPUs, and I understand that the existing /pci modules represent the
fastest/cheapest way to get there.

I won't block you from making any of the above changes, Sahid. I'll
even do my best to review them. However, I will be primarily
focusing this cycle on getting the nested resource providers work
feature-complete for (at least) SR-IOV PF/VF devices.

The decision of whether to allow an approach that adds more to the
existing /pci module is ultimately Matt's.

Best,
-jay


______ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Nested resource providers is not merged or production ready because we
haven't made it a priority. We've certainly talked about it and Jay
has had patches proposed for several releases now though.

Building vGPU support into the existing framework, which only a couple
of people understand - certainly not me, might be a short-term gain
but is just more technical debt we have to pay off later, and delays
any focus on nested resource providers for the wider team.

At the Queens PTG it was abundantly clear that many features are
dependent on nested resource providers, including several
networking-related features like bandwidth-based scheduling.

The priorities for placement/scheduler in Queens are:

  1. Dan Smith's migration allocations cleanup.
  2. Alternative hosts for reschedules with cells v2.
  3. Nested resource providers.

All of these are in progress and need review.

I personally don't think we should abandon the plan to implement vGPU
support with nested resource providers without first seeing any code
changes for it as a proof of concept. It also sounds like we have a
pretty simple staggered plan for rolling out vGPU support so it's not
very detailed to start. The virt driver reports vGPU inventory and we
decorate the details later with traits (which Alex Xu is working on and needs review).

Sahid, you could certainly implement a separate proof of concept and
make that available if the nested resource providers-based change hits
major issues or goes far too long and has too much risk, then we have
a contingency plan at least. But I don't expect that to get review
priority and you'd have to accept that it might not get merged since
we want to use nested resource providers.

That seems to be fair, I understand your desire to make the implementation on Resource Provider a priority and I'm with you. In general my preference is to do not stop progress on virt features because we have a new "product" on-going.

Either way we are going to need solid functional testing and that
functional testing should be written against the API as much as
possible so that it works regardless of the backend implementation of
the feature. One of the big things we failed at in Pike was not doing
enough functional testing of move operations with claims in the
scheduler earlier in the cycle. That all came in late and we're still fixing bugs as a result.

It's very true and most of the time we are asking our users to be beta-testers, that is one more reason why my preference is for a real deprecation phase.

If we can get started early on the functional testing for vGPUs, then
work both implementations in parallel, we should be able to retain the
functional tests and determine which implementation we ultimately need
to go with probably sometime in the second milestone.

--

Thanks,

Matt


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 26, 2017 by Sahid_Orentino_Ferdj (1,020 points)   1
0 votes

-----Original Message-----
From: Sahid Orentino Ferdjaoui [mailto:sferdjao@redhat.com]
Sent: Tuesday, September 26, 2017 1:46 PM
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] vGPUs support for Nova

On Mon, Sep 25, 2017 at 04:59:04PM +0000, Jianghua Wang wrote:

Sahid,

Just share some background. XenServer doesn't expose vGPUs as mdev or
pci devices.

That does not make any sense. There is physical device (PCI) which
provides functions (vGPUs). These functions are exposed through mdev
framework. What you need is the mdev UUID related to a specific vGPU
and I'm sure that XenServer is going to expose it. Something which
XenServer may not expose is the NUMA node where the physical device is
plugged on but in such situation you could still use sysfs.
[Mooney, Sean K] this is implementation specific. Amd support virtualizing
There gpu using sriov http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf
In that case you can use the existing pci pass-through support without any modification.
For intel and nvidia gpus we need speficic hypervisor support as the device partitioning
Is done in the host gpu driver rather than via sirov. There are two level of abstraction
That we must keep separate. 1 how does the hardware support configuration and enumeration
Of the virutalised resources (amd in hardware via sriov, intel/nvidia via driver/software manager).
2 how does the hypervisor report the vgpus to openstack and other clients.

In the amd case I would not expect any hypervisor to have mdevs associated with
The sriov vf as that is not the virtualization model they have implemented.
In the intel gvt case yes you will have mdevs but the virtual gpus are not
Represented on the pci bus so we should not model them as pci deveices.

Some more comments below.

I proposed a spec about one year ago to make fake pci devices so that
we can use the existing PCI mechanism to cover vGPUs. But that's not
a
good design and got strongly objection. After that, we switched to
use
the resource providers by following the advice from the core team.

Regards,
Jianghua

-----Original Message-----
From: Sahid Orentino Ferdjaoui [mailto:sferdjao@redhat.com]
Sent: Monday, September 25, 2017 11:01 PM
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] vGPUs support for Nova

On Mon, Sep 25, 2017 at 09:29:25AM -0500, Matt Riedemann wrote:

On 9/25/2017 5:40 AM, Jay Pipes wrote:

On 09/25/2017 05:39 AM, Sahid Orentino Ferdjaoui wrote:

There is a desire to expose the vGPUs resources on top of
Resource Provider which is probably the path we should be going
in the long term. I was not there for the last PTG and you
probably already made a decision about moving in that direction
anyway. My personal feeling is that it is premature.

The nested Resource Provider work is not yet feature-complete
and requires more reviewer attention. If we continue in the
direction of Resource Provider, it will need at least 2 more
releases to expose the vGPUs feature and that without the
support of NUMA, and with the feeling of pushing something
which is not stable/production-ready.
[Mooney, Sean K] Not all gpus have numam affinity. Intel integrated gpus do not. they have
Dedicated edram on the processor die so there memory accesses never leave
The processor package sot they do not have numa affinity. I would assume the
Same is true for amd integrated gpus so only descreet gpus will have numa affinity.

It's seems safer to first have the Resource Provider work well
finalized/stabilized to be production-ready. Then on top of
something stable we could start to migrate our current virt
specific features like NUMA, CPU Pinning, Huge Pages and
finally PCI devices.

I'm talking about PCI devices in general because I think we
should implement the vGPU on top of our /pci framework which is
production ready and provides the support of NUMA.

The hardware vendors building their drivers using mdev and the
This is vendor specific intel uses mdevs for intel GVT(kvmgt/xengt).
Amd uses sriov and does not use mdevs it uses sriov http://www.amd.com/Documents/Multiuser-GPU-White-Paper.pdf.

Amd is simple because you just do a pci passthough of the sriov-vf and your are done.
No explcit support need in the hypervisior.

Looking at https://images.nvidia.com/content/grid/pdf/GRID-vGPU-User-Guide.pdf section
5.3.3.2. when we query a specific physical gpu for the support paramaters via xen it reports the pci address of that
Physical gpu and part of the responce

[root@xenserver ~]# xe pgpu-param-list uuid=f2607117-5b4c-d6cc-3900-00bf712e33f4
uuid ( RO) : f2607117-5b4c-d6cc-3900-00bf712e33f4
vendor-name ( RO): NVIDIA Corporation
device-name ( RO): GK104GL [GRID K2]
gpu-group-uuid ( RW): f4662c69-412c-abc5-6d02-f74b7703cccd
gpu-group-name-label ( RO): GRID K2 Socket 0
host-uuid ( RO): d9eb9118-a5c5-49fb-970e-80e6a8f7ff98
host-name-label ( RO): xenserver-vgx-test (VM IPs 10.31.223.0-49,
dom0 .96, OOB .97)
pci-id ( RO): 0000:08:00.0
dependencies (SRO):
other-config (MRW):
supported-VGPU-types ( RO): a724b756-d108-4c9f-0ea3-8f3a1553bfbc;
63d9d912-3454-b020-8519-58dedb3b0117; 0bdf4715-e035-19c3-a57d-5ead20b3e5cd;
a7838abe-0d73-1918-7d29-fd361d3e411f
enabled-VGPU-types (SRW): a724b756-d108-4c9f-0ea3-8f3a1553bfbc;
63d9d912-3454-b020-8519-58dedb3b0117; 0bdf4715-e035-19c3-a57d-5ead20b3e5cd;
a7838abe-0d73-1918-7d29-fd361d3

looking at section 5.2 I would appear that for nvida we create vgpus using the group rather
then the parent gpu as follows
xe vgpu-create vm-uuid=e71afda4-53f4-3a1b-6c92-a364a7f619c2
gpu-group-uuid=be825ba2-01d7-8d51-9780-f82cfaa64924
vgpu-typeuuid=3f318889-7508-c9fd-7134-003d4d05ae56b73cbd30-096f-8a9a-523e-a800062f4ca7

that makes sense because they handel allocation to a specific gpu based on a policy
(pack vs spread) as described in sect 5.3. the groups are auto created by xen based
On vendor-id and product id of the nvidia gpu but based on the group name I would guess
It also groups by socket which they may be conflating with numa node. In either case
We could use the pci address and sysfs to confirm the numa affintity if needed.

In the intel case https://github.com/01org/gvt-linux/wiki/GVTg_Setup_Guide#41-linux-guest-setup
Looking at section 5.3 in the kvm case we create memdevs by echoing a uuid to a sysfs file that is a child of
The intergraged gpu on the pci bus e.g.
echo "a297db4a-f4c2-11e6-90f6-d3b88d6c9525" > "/sys/bus/pci/devices/0000:00:02.0/ mdevsupportedtypes/i915-GVTgV44/create"
and then do a pci passthough of that memdev

e.g. section 5.6.1
/bin/bash -x
/usr/bin/qemu-system-x8664 \
-m 2048 -smp 2 -M pc \
-name gvt-g-guest \
-hda /home/img/ubuntu-1.qcow2 \
-bios /usr/bin/bios.bin -enable-kvm \
-net nic,macaddr=00:A1:00:00:00:1A -net tap,script=/etc/qemu-ifup \
-vga qxl \
-k en-us \
-serial stdio \
-vnc :1 \
-machine kernel
irqchip=on \
-global PIIX4PM.disables3=1 -global PIIX4PM.disables4=1 \
-cpu host -usb -usbdevice tablet \
-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/a297db4a-f4c2-11e6-90f6-d3b88d6c9525,rombar=0

So in the intel case the vgpu is directly a child of a specific pci device but in the nvidia case it’s a child of a pool of pci devices.

/pci framework currently understand only SRIOV but on a quick
glance it does not seem complicated to make it support mdev.

In the /pci framework we will have to:

  • Update the PciDevice object fields to accept NULL value for
       'address' and add new field 'uuid'
    I think the PFs should still have pciadresses
    The vfs on the other hand will be created dynmaicaly so
    If they were tracked in this table perhaps the uuid would be more appropriate.
    In the amd case though the vgpus really do have pci addresses so you cant assume uuids
  • Update PciRequest to handle a new tag like 'vgpu_types'
    If we where to do this I would prefer if this was added into the capabilities
    Dictionary we are introducing in the nic feature based scheduling spec.
    We intentionally made that generic so that in addition to holding network capabilities
    It could have gpu or fpga or other-fancy-acllerator if needed though traits will ultimately
    Be the end goal. See https://review.openstack.org/#/c/451777/26/nova/pci/stats.py

  • Update PciDeviceStats to also maintain pool of vGPUs

The operators will have to create alias(-es) and configure
flavors. Basically most of the logic is already implemented and
the method 'consume_request' is going to select the right vGPUs
according the request.

In /virt we will have to:

  • Update the field 'pcipassthroughdevices' to also include
    GPUs
       devices.
  • Update attach/detach PCI device to handle vGPUs

We have a few people interested in working on it, so we could
certainly make this feature available for Queen.

I can take the lead updating/implementing the PCI and libvirt
driver part, I'm sure Jianghua Wang will be happy to take the
lead for the virt XenServer part.

And I trust Jay, Stephen and Sylvain to follow the
developments.

I understand the desire to get something in to Nova to support
vGPUs, and I understand that the existing /pci modules represent
the fastest/cheapest way to get there.

I won't block you from making any of the above changes, Sahid.
I'll even do my best to review them. However, I will be primarily
focusing this cycle on getting the nested resource providers work
feature-complete for (at least) SR-IOV PF/VF devices.

The decision of whether to allow an approach that adds more to
the
existing /pci module is ultimately Matt's.

Best,
-jay


__ ______ OpenStack Development Mailing List (not for usage
questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Nested resource providers is not merged or production ready because
we haven't made it a priority. We've certainly talked about it and
Jay has had patches proposed for several releases now though.

Building vGPU support into the existing framework, which only a
couple of people understand - certainly not me, might be a
short-term gain but is just more technical debt we have to pay off
later, and delays any focus on nested resource providers for the
wider team.

At the Queens PTG it was abundantly clear that many features are
dependent on nested resource providers, including several
networking-related features like bandwidth-based scheduling.

The priorities for placement/scheduler in Queens are:

  1. Dan Smith's migration allocations cleanup.
  2. Alternative hosts for reschedules with cells v2.
  3. Nested resource providers.

All of these are in progress and need review.

I personally don't think we should abandon the plan to implement
vGPU support with nested resource providers without first seeing
any
code changes for it as a proof of concept. It also sounds like we
have a pretty simple staggered plan for rolling out vGPU support so
it's not very detailed to start. The virt driver reports vGPU
inventory and we decorate the details later with traits (which Alex
Xu is working on and needs review).

Sahid, you could certainly implement a separate proof of concept
and
make that available if the nested resource providers-based change
hits major issues or goes far too long and has too much risk, then
we have a contingency plan at least. But I don't expect that to get
review priority and you'd have to accept that it might not get
merged since we want to use nested resource providers.

That seems to be fair, I understand your desire to make the
implementation on Resource Provider a priority and I'm with you. In
general my preference is to do not stop progress on virt features
because we have a new "product" on-going.

Either way we are going to need solid functional testing and that
functional testing should be written against the API as much as
possible so that it works regardless of the backend implementation
of the feature. One of the big things we failed at in Pike was not
doing enough functional testing of move operations with claims in
the scheduler earlier in the cycle. That all came in late and we're
still fixing bugs as a result.

It's very true and most of the time we are asking our users to be
beta-testers, that is one more reason why my preference is for a real
deprecation phase.

If we can get started early on the functional testing for vGPUs,
then work both implementations in parallel, we should be able to
retain the functional tests and determine which implementation we
ultimately need to go with probably sometime in the second
milestone.

--

Thanks,

Matt


__ ____ OpenStack Development Mailing List (not for usage
questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 27, 2017 by Mooney,_Sean_K (3,580 points)   3 9
...