settingsLogin | Registersettings

[openstack-dev] [all][python3] use of six.iteritems()

0 votes

I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

Thanks,
Rob

[1]
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 18.9 msec per loop
pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 65.8 msec per loop

and out of interest, assuming that that hadn't triggered the JIT....

but it had.
pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
in d.items(): pass'
1000 loops, best of 3: 64.3 msec per loop

--
Robert Collins rbtcollins@hp.com
Distinguished Technologist
HP Converged Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jun 10, 2015 in openstack-dev by Robert_Collins (27,200 points)   4 6 12
retagged Jan 25, 2017 by admin

33 Responses

0 votes

Huge +1 both for the suggestion and for reasoning.

It's better to avoid substituting language features by a library.

Eugene.

On Tue, Jun 9, 2015 at 5:15 PM, Robert Collins robertc@robertcollins.net
wrote:

I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

Thanks,
Rob

[1]
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 18.9 msec per loop
pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 65.8 msec per loop

and out of interest, assuming that that hadn't triggered the JIT....

but it had.
pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
in d.items(): pass'
1000 loops, best of 3: 64.3 msec per loop

--
Robert Collins rbtcollins@hp.com
Distinguished Technologist
HP Converged Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Eugene_Nikanorov (7,480 points)   1 4 7
0 votes

On 06/09/2015 08:15 PM, Robert Collins wrote:
I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

+1

-jay


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Jay_Pipes (59,760 points)   3 11 14
0 votes

+1

Don't forget values and keys in addition to items. They aren't as common
but come up every so often. I think you can iterate the keys just by
iterating on the dict itself.

Carl
On Jun 9, 2015 6:18 PM, "Robert Collins" robertc@robertcollins.net wrote:

I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

Thanks,
Rob

[1]
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 18.9 msec per loop
pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 65.8 msec per loop

and out of interest, assuming that that hadn't triggered the JIT....

but it had.
pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
in d.items(): pass'
1000 loops, best of 3: 64.3 msec per loop

--
Robert Collins rbtcollins@hp.com
Distinguished Technologist
HP Converged Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Carl_Baldwin (14,940 points)   2 4 7
0 votes

maybe the suggestion should be "don't blindly apply six.iteritems or items" rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1].  one "million item dictionary" might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything.

[1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3

cheers,
gord


Date: Wed, 10 Jun 2015 12:15:33 +1200
From: robertc@robertcollins.net
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [all][python3] use of six.iteritems()

I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

Thanks,
Rob

[1]
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 18.9 msec per loop
pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 65.8 msec per loop

and out of interest, assuming that that hadn't triggered the JIT....

but it had.
pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
in d.items(): pass'
1000 loops, best of 3: 64.3 msec per loop

--
Robert Collins rbtcollins@hp.com
Distinguished Technologist
HP Converged Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by gordon_chung (19,300 points)   2 3 8
0 votes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 06/10/2015 02:15 AM, Robert Collins wrote:
I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I
get the feeling that folk are applying a massive regex to find
things like d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems()
matters. Secondly, no really - unless you're doing HUGE dicts, it
doesn't matter. Thirdly. Really, it doesn't.

Does it hurt though? ;)

At 1 million items the overhead is 54ms[1]. If we're doing inner
loops on million item dictionaries anywhere in OpenStack today, we
have a problem. We might want to in e.g. the scheduler... if it
held in-memory state on a million hypervisors at once, because I
don't really to to imagine it pulling a million rows from a DB on
every action. But then, we'd be looking at a whole 54ms. I think we
could survive, if we did that (which we don't).

So - please, no six.iteritems().

The reason why in e.g. neutron we merged the patch using six.iteritems
is that we don't want to go too deep into determining whether the
original usage of iteritems() was justified. The goal of the patch is
to get python3 support, not to apply subjective style guidelines, so
if someone wants to eliminate .iteritems(), he should create another
patch just for that and struggle with reviewing it. While folks
interested python3 can proceed with their work.

We should not be afraid of multiple patches.

Ihar
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVeAO+AAoJEC5aWaUY1u57/XwH/0AsOQHa1IDWOauginSHAbi+
ZNwAUDRSKEI+ydwf9u/DxRkZP2MsiJwAbrlPeGyjr8aqNpqoTLcS5CxYaS7IqSOn
khrVGkczv6yNwKrB6j3jAFJtXz+Z2i475eTLJqRgdUeI4gJinhc0ghXJzF+4HpUN
2DewJlOqrD3OWJcUu0Gvmp4aEkr8JK0Iu2crCRoFJ2N5fvv7rt8FfcZ3oGkixJXd
n0+xD5Aszl8M/jAv3xt7ZxqFSL7QUiEhAVVgJEHm0D8mAR+2J9bpCKVvjkJ5T7Tw
fkHYXetzUipe0MMpXPl3jfSKBitpFOOOEBaqOSXvvgtxAo8U6nkNgPe6n+vuduc=
=lUY4
-----END PGP SIGNATURE-----


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Ihar_Hrachyshka (35,300 points)   3 10 16
0 votes

On 10 June 2015 at 17:22, gordon chung gord@live.ca wrote:
maybe the suggestion should be "don't blindly apply six.iteritems or items" rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one "million item dictionary" might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything.

[1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3

I wouldn't make that assumption.

And no, memory isn't an issue. If you have a million item dict,
ignoring the internal overheads, the dict needs 1 million object
pointers. The size of a list with those pointers in it is 1M (pointer
size in bytes). E.g. 4M or 8M. Nothing to worry about given the
footprint of such a program :)

-Rob

--
Robert Collins rbtcollins@hp.com
Distinguished Technologist
HP Converged Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Robert_Collins (27,200 points)   4 6 12
0 votes

On 10 June 2015 at 21:30, Ihar Hrachyshka ihrachys@redhat.com wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 06/10/2015 02:15 AM, Robert Collins wrote:

I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I
get the feeling that folk are applying a massive regex to find
things like d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems()
matters. Secondly, no really - unless you're doing HUGE dicts, it
doesn't matter. Thirdly. Really, it doesn't.

Does it hurt though? ;)

Yes.

Its: harder to read. Its going to have to be removed eventually anyway
(when we stop supporting 2.7). Its marginally slower on 3.x (it has a
function and an iterator wrapping the actual thing). Its unidiomatic,
and we get lots of programmers that are new to Python; we should be
giving them as beautiful code as we can to help them learn.

At 1 million items the overhead is 54ms[1]. If we're doing inner
loops on million item dictionaries anywhere in OpenStack today, we
have a problem. We might want to in e.g. the scheduler... if it
held in-memory state on a million hypervisors at once, because I
don't really to to imagine it pulling a million rows from a DB on
every action. But then, we'd be looking at a whole 54ms. I think we
could survive, if we did that (which we don't).

So - please, no six.iteritems().

The reason why in e.g. neutron we merged the patch using six.iteritems
is that we don't want to go too deep into determining whether the
original usage of iteritems() was justified.

Its not.

The goal of the patch is
to get python3 support, not to apply subjective style guidelines, so
if someone wants to eliminate .iteritems(), he should create another
patch just for that and struggle with reviewing it. While folks
interested python3 can proceed with their work.

We should not be afraid of multiple patches.

We shouldn't be indeed. All I'm asking is that we don't do poor
intermediate patches.

I've written code where performance tuning like that around iteritems
mattered. That code also needed to optimise tuple unpacking to avoid
performance hits and was aiming to manipulate million item data sets
from interpreter startup in subsecond times. It was some of the worst,
most impenetrable Python code I've ever seen, and while our code has
lots of issues, it neither has the same performance context that that
did, nor (thankfully) is it such impenetrable code.

-Rob

--
Robert Collins rbtcollins@hp.com
Distinguished Technologist
HP Converged Cloud


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Robert_Collins (27,200 points)   4 6 12
0 votes

On 06/09/2015 08:15 PM, Robert Collins wrote:
I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

Thanks,
Rob

[1]
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 18.9 msec per loop
pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 65.8 msec per loop

and out of interest, assuming that that hadn't triggered the JIT....

but it had.
pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
in d.items(): pass'
1000 loops, best of 3: 64.3 msec per loop

That's awesome, because those six.iteritems loops make me want to throw
up a little. Very happy to have our code just use items instead.

-Sean

--
Sean Dague
http://dague.net


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 10, 2015 by Sean_Dague (66,200 points)   4 10 16
0 votes

Date: Wed, 10 Jun 2015 21:33:44 +1200
From: robertc@robertcollins.net
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()

On 10 June 2015 at 17:22, gordon chung gord@live.ca wrote:

maybe the suggestion should be "don't blindly apply six.iteritems or items" rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one "million item dictionary" might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything.

[1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3

I wouldn't make that assumption.

And no, memory isn't an issue. If you have a million item dict,
ignoring the internal overheads, the dict needs 1 million object
pointers. The size of a list with those pointers in it is 1M (pointer
size in bytes). E.g. 4M or 8M. Nothing to worry about given the
footprint of such a program :)
iiuc, items() (in py2) will create a copy of the dictionary in memory to be processed. this is useful for cases such as concurrency where you want to ensure consistency but doing a quick test i noticed a massive spike in memory usage between items() and iteritems.
'for i in dict(enumerate(range(1000000))).items(): pass' consumes significantly more memory than 'for i in dict(enumerate(range(1000000))).iteritems(): pass'. on my system, the difference in memory consumption was double when using items() vs iteritems() and the cpu util was significantly more as well... let me know if there's anything that stands out as inaccurate.
unless there's something wrong with my ignorant testing above, i think it's something projects should consider when mass applying any iteritems/items patch.
cheers,gord
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

responded Jun 11, 2015 by gordon_chung (19,300 points)   2 3 8
0 votes

tl;dr .iteritems() is faster and more memory efficient than .items() in
python2

Using xrange() in python2 instead of range() because it's more memory
efficient and consistent between python 2 and 3...

xrange() + .items()

python -m timeit -n 20 for\ i\ in\
dict(enumerate(xrange(1000000))).items():\ pass
20 loops, best of 3: 729 msec per loop
peak memory usage: 203 megabytes

xrange() + .iteritems()

python -m timeit -n 20 for\ i\ in\
dict(enumerate(xrange(1000000))).iteritems():\ pass
20 loops, best of 3: 644 msec per loop
peak memory usage: 176 megabytes

python 3

python3 -m timeit -n 20 for\ i\ in\
dict(enumerate(range(1000000))).items():\ pass
20 loops, best of 3: 826 msec per loop
peak memory usage: 198 megabytes

And if you really want to see the results with range() in python2...

range() + .items()

python -m timeit -n 20 for\ i\ in\
dict(enumerate(range(1000000))).items():\ pass
20 loops, best of 3: 851 msec per loop
peak memory usage: 254 megabytes

range() + .iteritems()

python -m timeit -n 20 for\ i\ in\
dict(enumerate(range(1000000))).iteritems():\ pass
20 loops, best of 3: 919 msec per loop
peak memory usage: 184 megabytes

To benchmark memory consumption, I used the following on bare metal:

$ valgrind --tool=massif --pages-as-heap=yes -massif-out-file=massif.out
$COMMANDFROMABOVE
$ cat massif.out | grep memheapB | sort -u

$ python2 --version
Python 2.7.9

$ python3 --version
Python 3.4.3

On Wed, Jun 10, 2015 at 8:36 PM, gordon chung gord@live.ca wrote:

Date: Wed, 10 Jun 2015 21:33:44 +1200
From: robertc@robertcollins.net
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()

On 10 June 2015 at 17:22, gordon chung gord@live.ca wrote:

maybe the suggestion should be "don't blindly apply six.iteritems or
items" rather than don't apply iteritems at all. admittedly, it's a massive
eyesore, but it's a very real use case that some projects deal with large
data results and to enforce the latter policy can have negative effects[1].
one "million item dictionary" might be negligible but in a multi-user,
multi-* environment that can have a significant impact on the amount memory
required to store everything.

[1] disclaimer: i have no real world results but i assume memory
management was the reason for the switch in logic from py2 to py3

I wouldn't make that assumption.

And no, memory isn't an issue. If you have a million item dict,
ignoring the internal overheads, the dict needs 1 million object
pointers. The size of a list with those pointers in it is 1M (pointer
size in bytes). E.g. 4M or 8M. Nothing to worry about given the
footprint of such a program :)

iiuc, items() (in py2) will create a copy of the dictionary in memory to
be processed. this is useful for cases such as concurrency where you want
to ensure consistency but doing a quick test i noticed a massive spike in
memory usage between items() and iteritems.

'for i in dict(enumerate(range(1000000))).items(): pass' consumes
significantly more memory than 'for i in
dict(enumerate(range(1000000))).iteritems(): pass'. on my system, the
difference in memory consumption was double when using items() vs
iteritems() and the cpu util was significantly more as well... let me know
if there's anything that stands out as inaccurate.

unless there's something wrong with my ignorant testing above, i think
it's something projects should consider when mass applying any
iteritems/items patch.

cheers,
gord


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 11, 2015 by Dolph_Mathews (9,560 points)   1 2 3
...