settingsLogin | Registersettings

[openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

0 votes

The gate-tempest-dsvm-neutron-full-ubuntu-xenial job is 20..30 min slower
than it supposed to be/used to be.

The extra time has multiple reasons and it is not because we test more :( .
Usually we are just less smart than before.

Huge time increment is visible in devstack as well.
devstack is advertised as:

Running devstack ... this takes 10 - 15 minutes (logs in
logs/devstacklog.txt.gz)

The actual time is 20 - 25 min according to openstack health:
http://status.openstack.org/openstack-health/#/test/devstack?resolutionKey=day&duration=P6M

Let's start with the first obvious difference compared to the old-time
jobs.:
The jobs does 120..220 sec apt-get install and packages defined
/files/debs/general are missing from the images before starting the job.

We used to bake multiple packages into the images based on the package list
provided by devstack in order to save time.

Why this does not happens anymore ?
Is anybody working on solving this issue ?
Is any blocker technical issue / challenge exists ?
Was it a design decision ?

We have similar issue with pypi usage as well.

PS.:
Generally a good idea to group these kind of package install commands to
one huge pip/apt-get/yum .. invocation, because these tools has significant
start up time and they also need to process the dependency graph at
install/update.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Sep 26, 2017 in openstack-dev by Attila_Fazekas (2,680 points)   2 6

16 Responses

0 votes

On 2017-09-19 14:15:53 +0200 (+0200), Attila Fazekas wrote:
[...]
Let's start with the first obvious difference compared to the old-time
jobs.:
The jobs does 120..220 sec apt-get install and packages defined
/files/debs/general are missing from the images before starting the job.

We used to bake multiple packages into the images based on the package list
provided by devstack in order to save time.

Why this does not happens anymore ?
Is anybody working on solving this issue ?
Is any blocker technical issue / challenge exists ?
Was it a design decision ?
[...]

In order to reduce image sizes and the time it takes to build
images, once we had local package caches in each provider we stopped
pre-retrieving packages onto the images. Is the time spent at this
stage mostly while downloading package files (which is what that
used to alleviate) or is it more while retrieving indices or
installing the downloaded packages (things having them pre-retrieved
on the images never solved anyway)?

Our earlier analysis of the impact of dropping package files from
images indicated it was negligible for most jobs because of the
caching package mirrors we maintain nearby.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Sep 19, 2017 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On 09/19/2017 11:03 PM, Jeremy Stanley wrote:
On 2017-09-19 14:15:53 +0200 (+0200), Attila Fazekas wrote:
[...]

The jobs does 120..220 sec apt-get install and packages defined
/files/debs/general are missing from the images before starting the job.

Is the time spent at this stage mostly while downloading package
files (which is what that used to alleviate) or is it more while
retrieving indices or installing the downloaded packages (things
having them pre-retrieved on the images never solved anyway)?

As you're both aware, but others may not be, at the end of the logs
devstack does keep a timing overview that looks something like

=========================
DevStack Component Timing
=========================
Total runtime 1352

runprocess 15
test
withretry 4
apt-get-update 2
pip
install 270
osc 365
waitforservice 29
dbsync 23
apt-get 137
=========================

That doesn't break things down into download v install, but apt does
have download summary that can be grepped for


$ cat devstacklog.txt.gz | grep Fetched
2017-09-19 17:52:45.808 | Fetched 39.3 MB in 1s (26.3 MB/s)
2017-09-19 17:53:41.115 | Fetched 185 kB in 0s (3,222 kB/s)
2017-09-19 17:54:16.365 | Fetched 23.5 MB in 1s (21.1 MB/s)
2017-09-19 17:54:25.779 | Fetched 18.3 MB in 0s (35.6 MB/s)
2017-09-19 17:54:39.439 | Fetched 59.1 kB in 0s (0 B/s)
2017-09-19 17:54:40.986 | Fetched 2,128 kB in 0s (40.0 MB/s)
2017-09-19 17:57:37.190 | Fetched 333 kB in 0s (1,679 kB/s)
2017-09-19 17:58:17.592 | Fetched 50.5 MB in 2s (18.1 MB/s)
2017-09-19 17:58:26.947 | Fetched 5,829 kB in 0s (15.5 MB/s)
2017-09-19 17:58:49.571 | Fetched 5,065 kB in 1s (3,719 kB/s)
2017-09-19 17:59:25.438 | Fetched 9,758 kB in 0s (44.5 MB/s)
2017-09-19 18:00:14.373 | Fetched 77.5 kB in 0s (286 kB/s)

As mentioned, we setup the package manager to point to a region-local
mirror during node bringup. Depending on the i/o situation, it is
probably just as fast as coming off disk :) Note (also as mentioned)
these were never pre-installed, just pre-downloaded to an on-disk
cache area (as an aside, I don't think dnf was ever really happy with
that situation and kept being too smart and clearing it's caches).

If you're feeling regexy you could maybe do something similar with the
pip "Collecting" bits in the logs ... one idea for investigation down
that path is if we could save time by somehow collecting larger
batches of requirements and doing less pip invocations?

-i


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 19, 2017 by Ian_Wienand (3,620 points)   4 5
0 votes

On Tue, Sep 19, 2017 at 9:03 AM, Jeremy Stanley fungi@yuggoth.org wrote:

In order to reduce image sizes and the time it takes to build
images, once we had local package caches in each provider we stopped
pre-retrieving packages onto the images. Is the time spent at this
stage mostly while downloading package files (which is what that
used to alleviate) or is it more while retrieving indices or
installing the downloaded packages (things having them pre-retrieved
on the images never solved anyway)?

At what point does it become beneficial to build more than one image per OS
that is more aggressively tuned/optimized for a particular purpose ?

We could take more freedom in a devstack-specific image like pre-install
packages that are provided out of base OS, etc.
Different projects could take this kind of freedom to optimize build times
according to their needs as well.

Here's an example of something we once did in RDO:
1) Aggregate the list of every package installed (rpm -qa) at the end
of several jobs
2) From that sorted and uniq'd list, work out which repositories each
package came from
3) Blacklist every package that was not installed from a base
operating system repository
(i.e, blacklist every package and dependencies from RDO, since
we'll be testing these)
4) Pre-install every package that were not blacklisted in our images

The end result was a list of >700 packages 1 completely unrelated to
OpenStack that ended up
being installed anyway throughout different jobs.
To give an idea of numbers, a fairly vanilla CentOS image has ~400
packages installed.
You can find the (rudimentary) script to achieve this filtering is here 2.

David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 19, 2017 by dms_at_redhat.com (3,780 points)   3 4
0 votes

On 09/20/2017 09:30 AM, David Moreau Simard wrote:
At what point does it become beneficial to build more than one image per OS
that is more aggressively tuned/optimized for a particular purpose ?

... and we can put -dsvm- in the jobs names to indicate it should run
on these nodes :)

Older hands than myself will remember even more issues, but the
"thicker" the base-image has been has traditionally just lead to a lot
more corners for corner-cases can hide in. We saw this all the time
with "snapshot" images where we'd be based on upstream images that
would change ever so slightly and break things, leading to
diskimage-builder and the -minimal build approach.

That said, in a zuulv3 world where we are not caching all git and have
considerably smaller images, a nodepool that has a scheduler that
accounts for flavor sizes and could conceivably understand similar for
images, and where we're building with discrete elements that could
"bolt-on" things like a list-of-packages install sanely to daily
builds ... it's not impossible to imagine.

-i


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 20, 2017 by Ian_Wienand (3,620 points)   4 5
0 votes

On Wed, Sep 20, 2017 at 3:11 AM, Ian Wienand iwienand@redhat.com wrote:

On 09/20/2017 09:30 AM, David Moreau Simard wrote:

At what point does it become beneficial to build more than one image per
OS
that is more aggressively tuned/optimized for a particular purpose ?

... and we can put -dsvm- in the jobs names to indicate it should run
on these nodes :)

Older hands than myself will remember even more issues, but the
"thicker" the base-image has been has traditionally just lead to a lot
more corners for corner-cases can hide in. We saw this all the time
with "snapshot" images where we'd be based on upstream images that
would change ever so slightly and break things, leading to
diskimage-builder and the -minimal build approach.

That said, in a zuulv3 world where we are not caching all git and have
considerably smaller images, a nodepool that has a scheduler that
accounts for flavor sizes and could conceivably understand similar for
images, and where we're building with discrete elements that could
"bolt-on" things like a list-of-packages install sanely to daily
builds ... it's not impossible to imagine.

-i

The problem is these package install steps are not really I/O bottle-necked
in most cases,
even with a regular DSL speed you can frequently see
the decompress and the post config steps takes more time.

The site-local cache/mirror has visible benefit, but does not eliminates
the issues.

The main enemy is the single threaded CPU intensive operation in most
install/config related script,
the 2th most common issue is serially requesting high latency steps, which
does not reaches neither
the CPU or I/O possibilities at the end.

The fat images are generally cheaper even if your cloud has only 1Gb
Ethernet for image transfer.
You gain more by baking the packages into the image than the 1GbE can steal
from you, because
you also save time what would be loosen on CPU intensive operations or from
random disk access.

It is safe to add all distro packages used by devstack to the cloud image.

Historically we had issues with some base image packages which presence
changed the
behavior of some component ,for example firewalld vs. libvirt (likely an
already solved issue),
these packages got explicitly removed by devstack in case of necessary.
Those packages not requested by devstack !

Fedora/Centos also has/had issues with overlapping with pypi packages on
main filesystem,
(too long story, pointing fingers ..) , generally not a good idea to add
packages from pypi to
an image which content might be overridden by the distro's package manager.

The distribution package install time delays the gate response,
when the slowest ruining job delayed by this, than the whole response is
delayed.

It Is an user facing latency issue, which should be solved even if the cost
would be higher.

The image building was the good old working solution and unless the image
build
become a super expensive thing, this is still the best option.

site-local mirror also expected to help making the image build step(s)
faster and safer.

The other option is the ready scripts.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 20, 2017 by Attila_Fazekas (2,680 points)   2 6
0 votes

On 2017-09-20 15:17:28 +0200 (+0200), Attila Fazekas wrote:
[...]
The image building was the good old working solution and unless
the image build become a super expensive thing, this is still the
best option.
[...]

It became a super expensive thing, and that's the main reason we
stopped doing it. Now that Nodepool has grown support for
distributed/parallel image building and uploading, the cost model
may have changed a bit in that regard so I agree it doesn't hurt to
revisit that decision. Nevertheless it will take a fair amount of
convincing that the savings balances out the costs (not just in
resource consumption but also administrative overhead and community
impact... if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?).
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Sep 20, 2017 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

"if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?). "

IMHO we can try to have only one big image per distribution,
where the packages are the union of the packages requested by all team,
minus the packages blacklisted by any team.

You need to provide a bug link(s) (distribution/upstream bug) for
blacklisting
a package.

Very unlikely we will run out from the disk space juts because of the too
many packages,
usually if a package causes harm to anything it is a distro/upstream bug
which expected
to be solved within 1..2 cycle in the worst case scenario.

If the above thing proven to be not working, we need to draw the line based
on the
expected usage frequency.

On Wed, Sep 20, 2017 at 3:46 PM, Jeremy Stanley fungi@yuggoth.org wrote:

On 2017-09-20 15:17:28 +0200 (+0200), Attila Fazekas wrote:
[...]

The image building was the good old working solution and unless
the image build become a super expensive thing, this is still the
best option.
[...]

It became a super expensive thing, and that's the main reason we
stopped doing it. Now that Nodepool has grown support for
distributed/parallel image building and uploading, the cost model
may have changed a bit in that regard so I agree it doesn't hurt to
revisit that decision. Nevertheless it will take a fair amount of
convincing that the savings balances out the costs (not just in
resource consumption but also administrative overhead and community
impact... if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?).
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 22, 2017 by Attila_Fazekas (2,680 points)   2 6
0 votes

On 2017-09-22 15:04:43 +0200 (+0200), Attila Fazekas wrote:
"if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?). "

IMHO we can try to have only one big image per distribution,
where the packages are the union of the packages requested by all team,
minus the packages blacklisted by any team.
[...]

Until you realize that some projects want packages from UCA, from
RDO, from EPEL, from third-party package repositories. Version
conflicts mean they'll still spend time uninstalling the versions
they don't want and downloading/installing the ones they do so we
have to optimize for one particular set and make the rest
second-class citizens in that scenario.

Also, preinstalling packages means we don't test that projects
actually properly declare their system-level dependencies any
longer. I don't know if anyone's concerned about that currently, but
it used to be the case that we'd regularly add/break the package
dependency declarations in DevStack because of running on images
where the things it expected were preinstalled.
--
Jeremy Stanley


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Sep 22, 2017 by Jeremy_Stanley (56,700 points)   3 5 7
0 votes

On 22 September 2017 at 07:31, Jeremy Stanley fungi@yuggoth.org wrote:
On 2017-09-22 15:04:43 +0200 (+0200), Attila Fazekas wrote:

"if DevStack gets custom images prepped to make its jobs
run faster, won't Triple-O, Kolla, et cetera want the same and where
do we draw that line?). "

IMHO we can try to have only one big image per distribution,
where the packages are the union of the packages requested by all team,
minus the packages blacklisted by any team.
[...]

Until you realize that some projects want packages from UCA, from
RDO, from EPEL, from third-party package repositories. Version
conflicts mean they'll still spend time uninstalling the versions
they don't want and downloading/installing the ones they do so we
have to optimize for one particular set and make the rest
second-class citizens in that scenario.

Also, preinstalling packages means we don't test that projects
actually properly declare their system-level dependencies any
longer. I don't know if anyone's concerned about that currently, but
it used to be the case that we'd regularly add/break the package
dependency declarations in DevStack because of running on images
where the things it expected were preinstalled.
--
Jeremy Stanley

Another, more revolutionary (for good or ill) alternative would be to
move gates to run Kolla instead of DevStack. We're working towards
registry of images, and we support most of openstack services now. If
we enable mixed installation (your service in devstack-ish way, others
via Kolla), that should lower the amount of downloads quite
dramatically (lots of it will be downloads from registry which will be
mirrored/cached in every nodepool). Then all we really need is to
support barebone image with docker and ansible installed and that's
it.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 22, 2017 by Michał_Jastrzębski (9,220 points)   1 4 5
0 votes

On Fri, Sep 22, 2017, at 08:58 AM, Michał Jastrzębski wrote:
Another, more revolutionary (for good or ill) alternative would be to
move gates to run Kolla instead of DevStack. We're working towards
registry of images, and we support most of openstack services now. If
we enable mixed installation (your service in devstack-ish way, others
via Kolla), that should lower the amount of downloads quite
dramatically (lots of it will be downloads from registry which will be
mirrored/cached in every nodepool). Then all we really need is to
support barebone image with docker and ansible installed and that's
it.

Except that it very likely isn't going to use less bandwidth. We already
mirror most of these package repos so all transfers are local to the
nodepool cloud region. In total we seem to grab about 139MB of packages
for a neutron dvr multinode scenario job (146676348 bytes) on Ubuntu
Xenial. This is based off the package list compiled at
http://paste.openstack.org/raw/621753/ then asking apt-cache for the
package size for the latest version.

Kolla images on the other hand are in the multigigabyte range
http://tarballs.openstack.org/kolla/images/.

Clark


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Sep 22, 2017 by Clark_Boylan (8,800 points)   1 2 4
...