settingsLogin | Registersettings

[Openstack-operators] Memory usage of guest vms, ballooning and nova

0 votes

Hi,

Lately, on my production openstack Newton setup, I've ran into a
situation that defies my assumptions regarding memory management on
Openstack compute nodes and I've been looking for explanations.
Basically, we had a VM with a flavor that limited it to 96 GB of ram,
which, to be quite honest, we never thought we could ever reach. This is
a very important VM where we wanted to avoid running out of memory at
all cost. The VM itself generally uses about 12 GB of ram.

We were surprised when we noticed yesterday that this VM, which has been
running for several months, was using all its 96 GB on the compute host.
Despite that, in the guest, the OS was indicating a memory usage of
about 12 GB. The only explanation I see to this is that at some point in
time, the host had to allocate all the 96GB of ram to the VM process and
it never took back the allocated ram. This prevented the creation of
more guests on the node as it was showing it didn't have enough memory left.

Now, I was under the assumption that memory ballooning was integrated
into nova and that the amount of allocated memory to a specific guest
would deflate once that guest did not need the memory. After
verification, I've found blueprints for it, but I see no trace of any
implementation anywhere.

I also notice that on most of our compute nodes, the amount of ram used
is much lower than the amount of ram allocated to VMs, which I do
believe is normal.

So basically, my question is, how does openstack actually manage ram
allocation? Will it ever take back the unused ram of a guest process?
Can I force it to take back that ram?

--
Jean-Philippe Méthot
Openstack system administrator
PlanetHoster inc.
www.planethoster.net


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
asked Mar 24, 2017 in openstack-operators by Jean-Philippe_Methot (600 points)   1 2 5

6 Responses

0 votes

On 03/23/2017 11:01 AM, Jean-Philippe Methot wrote:

So basically, my question is, how does openstack actually manage ram allocation?
Will it ever take back the unused ram of a guest process? Can I force it to take
back that ram?

I don't think nova will automatically reclaim memory.

I'm pretty sure that if you have CONF.libvirt.memstatsperiod_seconds set
(which it is by default) then you can manually tell libvirt to reclaim some
memory via the "virsh setmem" command.

Chris


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Mar 23, 2017 by Chris_Friesen (20,420 points)   4 17 26
0 votes

Hi,

This is indeed linux, CentOS 7 to be more precise, using qemu-kvm as
hypervisor. The used ram was in the used column. While we have made
adjustments by moving and resizing the specific guest that was using 96
GB (verified in top), the ram usage is still fairly high for the amount
of allocated ram.

Currently the ram usage looks like this :

           total        used        free      shared buff/cache   

available
Mem: 251G 190G 60G 42M 670M 60G
Swap: 952M 707M 245M

I have 188.5GB of ram allocated to 22 instances on this node. I believe
it's unrealistic to think that all these 22 instances have cached/are
using up all their ram at this time.

On 2017-03-23 13:07, Kris G. Lindgren wrote:
Sorry for the super stupid question.

But if this is linux are you sure that the memory is not actually being consumed via buffers/cache?

free -m
total used free shared buff/cache available
Mem: 128751 27708 2796 4099 98246 96156
Swap: 8191 0 8191

Shows that of 128GB 27GB is used, but buffers/cache consumes 98GB of ram.


Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

On 3/23/17, 11:01 AM, "Jean-Philippe Methot" jp.methot@planethoster.info wrote:

 Hi,
 
 Lately, on my production openstack Newton setup, I've ran into a
 situation that defies my assumptions regarding memory management on
 Openstack compute nodes and I've been looking for explanations.
 Basically, we had a VM with a flavor that limited it to 96 GB of ram,
 which, to be quite honest, we never thought we could ever reach. This is
 a very important VM where we wanted to avoid running out of memory at
 all cost. The VM itself generally uses about 12 GB of ram.
 
 We were surprised when we noticed yesterday that this VM, which has been
 running for several months, was using all its 96 GB on the compute host.
 Despite that, in the guest, the OS was indicating a memory usage of
 about 12 GB. The only explanation I see to this is that at some point in
 time, the host had to allocate all the 96GB of ram to the VM process and
 it never took back the allocated ram. This prevented the creation of
 more guests on the node as it was showing it didn't have enough memory left.
 
 Now, I was under the assumption that memory ballooning was integrated
 into nova and that the amount of allocated memory to a specific guest
 would deflate once that guest did not need the memory. After
 verification, I've found blueprints for it, but I see no trace of any
 implementation anywhere.
 
 I also notice that on most of our compute nodes, the amount of ram used
 is much lower than the amount of ram allocated to VMs, which I do
 believe is normal.
 
 So basically, my question is, how does openstack actually manage ram
 allocation? Will it ever take back the unused ram of a guest process?
 Can I force it to take back that ram?
 
 --
 Jean-Philippe Méthot
 Openstack system administrator
 PlanetHoster inc.
 www.planethoster.net
 
 
 _______________________________________________
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 

--
Jean-Philippe Méthot
Openstack system administrator
PlanetHoster inc.
www.planethoster.net


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Mar 23, 2017 by Jean-Philippe_Methot (600 points)   1 2 5
0 votes

What sort of memory overcommit value are you running Nova with? The scheduler looks at an instance's reservation rather than how much memory is actually being used by QEMU when making a decision, as far as I'm aware (but please correct me if I am wrong on this point). If the HV has 128GB of memory, the instance has a reservation of 96GB, you have 16GB reserved via reservedhostmemorymb, ramallocation_ratio is set to 1.0, and you try to launch an instance from a flavor with 32GB of memory, it will fail to pass RamFilter in the scheduler and the scheduler will not consider it a valid host for placement. (I am assuming you are using FilterScheduler still, as I know nothing about the new placement API or what parts of it do and don't work in Newton.)

As far as why the memory didn't automatically get reclaimed, maybe KVM will only reclaim empty pages and memory fragmentation in the guest prevented it from doing so? It might also not actively try to reclaim memory unless it comes under pressure to do so, because finding empty pages and returning them to the host may be a somewhat time-consuming operation.

From: jp.methot@planethoster.info
Subject: Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova

Hi,

This is indeed linux, CentOS 7 to be more precise, using qemu-kvm as
hypervisor. The used ram was in the used column. While we have made
adjustments by moving and resizing the specific guest that was using 96
GB (verified in top), the ram usage is still fairly high for the amount
of allocated ram.

Currently the ram usage looks like this :

           total        used        free      shared buff/cache   

available
Mem: 251G 190G 60G 42M 670M 60G
Swap: 952M 707M 245M

I have 188.5GB of ram allocated to 22 instances on this node. I believe
it's unrealistic to think that all these 22 instances have cached/are
using up all their ram at this time.

On 2017-03-23 13:07, Kris G. Lindgren wrote:
Sorry for the super stupid question.

But if this is linux are you sure that the memory is not actually being consumed via buffers/cache?

free -m
total used free shared buff/cache available
Mem: 128751 27708 2796 4099 98246 96156
Swap: 8191 0 8191

Shows that of 128GB 27GB is used, but buffers/cache consumes 98GB of ram.


Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

On 3/23/17, 11:01 AM, "Jean-Philippe Methot" jp.methot@planethoster.info wrote:

 Hi,
 
 Lately, on my production openstack Newton setup, I've ran into a
 situation that defies my assumptions regarding memory management on
 Openstack compute nodes and I've been looking for explanations.
 Basically, we had a VM with a flavor that limited it to 96 GB of ram,
 which, to be quite honest, we never thought we could ever reach. This is
 a very important VM where we wanted to avoid running out of memory at
 all cost. The VM itself generally uses about 12 GB of ram.
 
 We were surprised when we noticed yesterday that this VM, which has been
 running for several months, was using all its 96 GB on the compute host.
 Despite that, in the guest, the OS was indicating a memory usage of
 about 12 GB. The only explanation I see to this is that at some point in
 time, the host had to allocate all the 96GB of ram to the VM process and
 it never took back the allocated ram. This prevented the creation of
 more guests on the node as it was showing it didn't have enough memory left.
 
 Now, I was under the assumption that memory ballooning was integrated
 into nova and that the amount of allocated memory to a specific guest
 would deflate once that guest did not need the memory. After
 verification, I've found blueprints for it, but I see no trace of any
 implementation anywhere.
 
 I also notice that on most of our compute nodes, the amount of ram used
 is much lower than the amount of ram allocated to VMs, which I do
 believe is normal.
 
 So basically, my question is, how does openstack actually manage ram
 allocation? Will it ever take back the unused ram of a guest process?
 Can I force it to take back that ram?
 
 --
 Jean-Philippe Méthot
 Openstack system administrator
 PlanetHoster inc.
 www.planethoster.net
 
 
 _______________________________________________
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 

--
Jean-Philippe Méthot
Openstack system administrator
PlanetHoster inc.
www.planethoster.net


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Mar 23, 2017 by Ned_Rhudy_(BLOOMBERG (1,340 points)   3 5
0 votes

On 2017-03-23 15:15, Edmund Rhudy (BLOOMBERG/ 120 PARK) wrote:
What sort of memory overcommit value are you running Nova with? The
scheduler looks at an instance's reservation rather than how much
memory is actually being used by QEMU when making a decision, as far
as I'm aware (but please correct me if I am wrong on this point). If
the HV has 128GB of memory, the instance has a reservation of 96GB,
you have 16GB reserved via reservedhostmemorymb,
ram
allocation_ratio is set to 1.0, and you try to launch an instance
from a flavor with 32GB of memory, it will fail to pass RamFilter in
the scheduler and the scheduler will not consider it a valid host for
placement. (I am assuming you are using FilterScheduler still, as I
know nothing about the new placement API or what parts of it do and
don't work in Newton.)
The overcommit value is set to 1.5 in the scheduler. It's not the
scheduler that was preventing the instance from being provisionned, it
was qemu returning that there was not enough ram when libvirt was trying
to provision the instance (that error was not handled well by openstack,
btw, but that's something else). So the instance does pass every filter.
It just ends up in error when getting provisioned in the compute node
because of a lack of ram, with the actual full error message only
visible in the QEMU logs.
As far as why the memory didn't automatically get reclaimed, maybe KVM
will only reclaim empty pages and memory fragmentation in the guest
prevented it from doing so? It might also not actively try to reclaim
memory unless it comes under pressure to do so, because finding empty
pages and returning them to the host may be a somewhat time-consuming
operation.

That's entirely possible, but according to the doc, libvirt is supposed
to have a memory balloon function that does the operation of reclaiming
empty pages from guest processes, or so I understand. Now, how this
function works is not exactly clear to me, or even if nova uses it or
not. Another user suggested it might not be automatic, which is in
accordance to what you're conjecturing.

From: jp.methot@planethoster.info
Subject: Re: [Openstack-operators] Memory usage of guest vms,
ballooning and nova

Hi, This is indeed linux, CentOS 7 to be more precise, using
qemu-kvm as hypervisor. The used ram was in the used column. While
we have made adjustments by moving and resizing the specific guest
that was using 96 GB (verified in top), the ram usage is still
fairly high for the amount of allocated ram. Currently the ram
usage looks like this : total used free shared buff/cache
available Mem: 251G 190G 60G 42M 670M 60G Swap: 952M 707M 245M I
have 188.5GB of ram allocated to 22 instances on this node. I
believe it's unrealistic to think that all these 22 instances have
cached/are using up all their ram at this time. On 2017-03-23
13:07, Kris G. Lindgren wrote: > Sorry for the super stupid
question. > > But if this is linux are you sure that the memory is
not actually being consumed via buffers/cache? > > free -m > total
used free shared buff/cache available > Mem: 128751 27708 2796
4099 98246 96156 > Swap: 8191 0 8191 > > Shows that of 128GB 27GB
is used, but buffers/cache consumes 98GB of ram. > >
___________________________________________________________________
> Kris Lindgren > Senior Linux Systems Engineer > GoDaddy > > On
3/23/17, 11:01 AM, "Jean-Philippe Methot"
<jp.methot@planethoster.info <mailto:jp.methot@planethoster.info>>
wrote: > > Hi, > > Lately, on my production openstack Newton
setup, I've ran into a > situation that defies my assumptions
regarding memory management on > Openstack compute nodes and I've
been looking for explanations. > Basically, we had a VM with a
flavor that limited it to 96 GB of ram, > which, to be quite
honest, we never thought we could ever reach. This is > a very
important VM where we wanted to avoid running out of memory at >
all cost. The VM itself generally uses about 12 GB of ram. > > We
were surprised when we noticed yesterday that this VM, which has
been > running for several months, was using all its 96 GB on the
compute host. > Despite that, in the guest, the OS was indicating
a memory usage of > about 12 GB. The only explanation I see to
this is that at some point in > time, the host had to allocate all
the 96GB of ram to the VM process and > it never took back the
allocated ram. This prevented the creation of > more guests on the
node as it was showing it didn't have enough memory left. > > Now,
I was under the assumption that memory ballooning was integrated >
into nova and that the amount of allocated memory to a specific
guest > would deflate once that guest did not need the memory.
After > verification, I've found blueprints for it, but I see no
trace of any > implementation anywhere. > > I also notice that on
most of our compute nodes, the amount of ram used > is much lower
than the amount of ram allocated to VMs, which I do > believe is
normal. > > So basically, my question is, how does openstack
actually manage ram > allocation? Will it ever take back the
unused ram of a guest process? > Can I force it to take back that
ram? > > -- > Jean-Philippe Méthot > Openstack system
administrator > PlanetHoster inc. > www.planethoster.net
 > > >
_______________________________________________ >
OpenStack-operators mailing list >
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org> >
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > -- Jean-Philippe Méthot Openstack system administrator
PlanetHoster inc. www.planethoster.net

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

--
Jean-Philippe Méthot
Openstack system administrator
PlanetHoster inc.
www.planethoster.net


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Mar 23, 2017 by Jean-Philippe_Methot (600 points)   1 2 5
0 votes

----- Original Message -----
From: "Jean-Philippe Methot" jp.methot@planethoster.info
To: "Edmund Rhudy" erhudy@bloomberg.net
Cc: openstack-operators@lists.openstack.org
Sent: Thursday, March 23, 2017 3:49:26 PM
Subject: Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova

On 2017-03-23 15:15, Edmund Rhudy (BLOOMBERG/ 120 PARK) wrote:

What sort of memory overcommit value are you running Nova with? The
scheduler looks at an instance's reservation rather than how much
memory is actually being used by QEMU when making a decision, as far
as I'm aware (but please correct me if I am wrong on this point). If
the HV has 128GB of memory, the instance has a reservation of 96GB,
you have 16GB reserved via reservedhostmemorymb,
ram
allocation_ratio is set to 1.0, and you try to launch an instance
from a flavor with 32GB of memory, it will fail to pass RamFilter in
the scheduler and the scheduler will not consider it a valid host for
placement. (I am assuming you are using FilterScheduler still, as I
know nothing about the new placement API or what parts of it do and
don't work in Newton.)
The overcommit value is set to 1.5 in the scheduler. It's not the
scheduler that was preventing the instance from being provisionned, it
was qemu returning that there was not enough ram when libvirt was trying
to provision the instance (that error was not handled well by openstack,
btw, but that's something else). So the instance does pass every filter.
It just ends up in error when getting provisioned in the compute node
because of a lack of ram, with the actual full error message only
visible in the QEMU logs.
As far as why the memory didn't automatically get reclaimed, maybe KVM
will only reclaim empty pages and memory fragmentation in the guest
prevented it from doing so? It might also not actively try to reclaim
memory unless it comes under pressure to do so, because finding empty
pages and returning them to the host may be a somewhat time-consuming
operation.

That's entirely possible, but according to the doc, libvirt is supposed
to have a memory balloon function that does the operation of reclaiming
empty pages from guest processes, or so I understand. Now, how this
function works is not exactly clear to me, or even if nova uses it or
not. Another user suggested it might not be automatic, which is in
accordance to what you're conjecturing.

As a general rule Libvirt provides an interface to facilitate various actions on the guest, but does not perform them without intervention - that is generally it needs to be triggered to do something either by a management layer (OpenStack, oVirt, virt-manager, Boxes, etc.) or explicit call from the operator (e.g. via virsh).

In this case as Chris noted while the memory stats are exposed by default, and while Libvirt exposes an API for interacting with the balloon, there is no process in Nova currently - or commonly deployed with it - that will actually exercise the ballooning mechanism to expand/contract the memory balloon. In oVirt/RHEV the traditional way to do it was using Memory Overcommitt Manager (MOM) to define and apply policies for managing it - the guest also needs to have a driver for the virtio balloon device IIRC.

Such things have been proposed in the past [2] in OpenStack but never made it to implementation to my knowledge though as you've discovered it still seems like something that is generally desirable.

Thanks,

Steve

[1] http://www.ovirt.org/develop/projects/mom/
[2] https://blueprints.launchpad.net/nova/+spec/libvirt-memory-ballooning

From: jp.methot@planethoster.info
Subject: Re: [Openstack-operators] Memory usage of guest vms,
ballooning and nova

Hi, This is indeed linux, CentOS 7 to be more precise, using
qemu-kvm as hypervisor. The used ram was in the used column. While
we have made adjustments by moving and resizing the specific guest
that was using 96 GB (verified in top), the ram usage is still
fairly high for the amount of allocated ram. Currently the ram
usage looks like this : total used free shared buff/cache
available Mem: 251G 190G 60G 42M 670M 60G Swap: 952M 707M 245M I
have 188.5GB of ram allocated to 22 instances on this node. I
believe it's unrealistic to think that all these 22 instances have
cached/are using up all their ram at this time. On 2017-03-23
13:07, Kris G. Lindgren wrote: > Sorry for the super stupid
question. > > But if this is linux are you sure that the memory is
not actually being consumed via buffers/cache? > > free -m > total
used free shared buff/cache available > Mem: 128751 27708 2796
4099 98246 96156 > Swap: 8191 0 8191 > > Shows that of 128GB 27GB
is used, but buffers/cache consumes 98GB of ram. > >
___________________________________________________________________
> Kris Lindgren > Senior Linux Systems Engineer > GoDaddy > > On
3/23/17, 11:01 AM, "Jean-Philippe Methot"
<jp.methot@planethoster.info <mailto:jp.methot@planethoster.info>>
wrote: > > Hi, > > Lately, on my production openstack Newton
setup, I've ran into a > situation that defies my assumptions
regarding memory management on > Openstack compute nodes and I've
been looking for explanations. > Basically, we had a VM with a
flavor that limited it to 96 GB of ram, > which, to be quite
honest, we never thought we could ever reach. This is > a very
important VM where we wanted to avoid running out of memory at >
all cost. The VM itself generally uses about 12 GB of ram. > > We
were surprised when we noticed yesterday that this VM, which has
been > running for several months, was using all its 96 GB on the
compute host. > Despite that, in the guest, the OS was indicating
a memory usage of > about 12 GB. The only explanation I see to
this is that at some point in > time, the host had to allocate all
the 96GB of ram to the VM process and > it never took back the
allocated ram. This prevented the creation of > more guests on the
node as it was showing it didn't have enough memory left. > > Now,
I was under the assumption that memory ballooning was integrated >
into nova and that the amount of allocated memory to a specific
guest > would deflate once that guest did not need the memory.
After > verification, I've found blueprints for it, but I see no
trace of any > implementation anywhere. > > I also notice that on
most of our compute nodes, the amount of ram used > is much lower
than the amount of ram allocated to VMs, which I do > believe is
normal. > > So basically, my question is, how does openstack
actually manage ram > allocation? Will it ever take back the
unused ram of a guest process? > Can I force it to take back that
ram? > > -- > Jean-Philippe Méthot > Openstack system
administrator > PlanetHoster inc. > www.planethoster.net
 > > >
_______________________________________________ >
OpenStack-operators mailing list >
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org> >
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > -- Jean-Philippe Méthot Openstack system administrator
PlanetHoster inc. www.planethoster.net

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Mar 24, 2017 by Steve_Gordon (9,680 points)   2 5 7
0 votes

On 03/23/2017 01:01 PM, Jean-Philippe Methot wrote:
Hi,

Lately, on my production openstack Newton setup, I've ran into a
situation that defies my assumptions regarding memory management on
Openstack compute nodes and I've been looking for explanations.
Basically, we had a VM with a flavor that limited it to 96 GB of ram,
which, to be quite honest, we never thought we could ever reach. This is
a very important VM where we wanted to avoid running out of memory at
all cost. The VM itself generally uses about 12 GB of ram.

We were surprised when we noticed yesterday that this VM, which has been
running for several months, was using all its 96 GB on the compute host.
Despite that, in the guest, the OS was indicating a memory usage of
about 12 GB. The only explanation I see to this is that at some point in
time, the host had to allocate all the 96GB of ram to the VM process and
it never took back the allocated ram. This prevented the creation of
more guests on the node as it was showing it didn't have enough memory
left.

Now, I was under the assumption that memory ballooning was integrated
into nova and that the amount of allocated memory to a specific guest
would deflate once that guest did not need the memory. After
verification, I've found blueprints for it, but I see no trace of any
implementation anywhere.

I also notice that on most of our compute nodes, the amount of ram used
is much lower than the amount of ram allocated to VMs, which I do
believe is normal.

So basically, my question is, how does openstack actually manage ram
allocation? Will it ever take back the unused ram of a guest process?
Can I force it to take back that ram?

Basically, you are using a hammer as a screwdriver.

The tool that Nova gives you to prevent other VMs from consuming memory
allocated to another VM is called the ramallocationratio. By default,
this is set to 1.5, meaning that if you have 100GB of RAM on a compute
host, you can allocate VMs that would consume up to 150GB of RAM.

For your VM that has 12GB of RAM used but 96GB allocated, you do not
want to do that. Instead, give that VM around 16GB of memory, set your
compute host's ramallocationratio (in nova.conf) to 1.0 and then
instances on that compute host will not be able to consume more RAM than
is available on the host.

Best,
-jay


OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
responded Mar 27, 2017 by Jay_Pipes (59,760 points)   3 11 14
...