How to use guppy/heapy for tracking down memory usage

This is a work in progress. It will grow a bit and it may not be entirely accurate everywhere.

Tutorial of sorts

All this was done on a checkout of marienz@gentoo.org-20060908111256-540d8fb3db5b337e, you should be able to check that out and follow along using something like:

bzr revert -rrevid:marienz@gentoo.org-20060908111256-540d8fb3db5b337e

in a pkgcore branch.

Heapy is powerful but has a learning curve. Problems are the documentation (http://guppy-pe.sourceforge.net/heapy_Use.html among others) is a bit unusual and there are various dynamic importing and other tricks in use that mean things like dir() are less helpful than they are on more "normal" python objects. This document's main purpose is to show you how to ask heapy various kinds of questions. It may or may not show a few cases where pkgcore uses more memory than it should too.

First, get an x86. Heapy currently does not like 64 bit archs much.

Emerge it:

emerge guppy

Fire up an interactive python prompt, set stuff up:

>>> from guppy import hpy
>>> from pkgcore.config import load_config
>>> c = load_config()
>>> hp = hpy()

Just to show how annoying heapy's internal tricks are:

>>> dir(hp)
['__doc__', '__getattr__', '__init__', '__module__', '__setattr__', '_hiding_tag_', '_import', '_name', '_owner', '_share']
>>> help(hp)
Help on class _GLUECLAMP_ in module guppy.etc.Glue:

_GLUECLAMP_ = <guppy.heapy.Use interface at 0x-484b8554>

This object is your "starting point", but as you can see the underlying machinery is not giving away any useful usage instructions.

Do everything that allocates some memory but is not the problem you are tracking down now. Then do:

>>> hp.setrelheap()

Everything allocated before this call will not be in the data sets you get later.

Now do your memory-intensive thing:

>>> l = list(x for x in c.repo["portdir"] if x.data)

Keep an eye on system memory consumption. You want to use up a lot but not all of your system ram for nicer statistics. The python process was eating about 109M res in top when the above stuff finished, which is pretty good (for my 512mb ram box).

>>> h = hp.heap()

The fun one. This object is basically a snapshot of what's reachable in ram (minus the stuff excluded through setrelheap earlier) which you can do various fun tricks with. Its str() is a summary:

>>> h
Partition of a set of 1449133 objects. Total size = 102766644 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 985931  68 46300932  45  46300932  45 str
     1  24681   2 22311624  22  68612556  67 dict of pkgcore.ebuild.ebuild_src.package
     2  49391   3 21311864  21  89924420  88 dict (no owner)
     3 115974   8  3776948   4  93701368  91 tuple
     4 152181  11  3043616   3  96744984  94 long
     5  36009   2  1584396   2  98329380  96 weakref.KeyedRef
     6  11328   1  1540608   1  99869988  97 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace
     7  24702   2   889272   1 100759260  98 types.MethodType
     8  11424   1   851840   1 101611100  99 list
     9  24681   2   691068   1 102302168 100 pkgcore.ebuild.ebuild_src.package
<54 more rows. Type e.g. '_.more' to view.>

(You might want to keep an eye on ram usage: heapy made the process grow another dozen mb here. It gets painfully slow if it starts swapping, so if that happens reduce your data set).

Notice the "Total size" in the top right: about 100M. That's what we need to compare later numbers with.

So here we can see that (surprise!) we have a ton of strings in memory. We also have various kinds of dicts. Dicts are treated a bit specially: the "dict of pkgcore.ebuild.ebuild_src.package" simply means "all the dicts that are __dict__ attributes of instances of that class". "dict (no owner)" are all the dicts that are not used as __dict__ attribute.

You probably guessed what you can use "index" for:

>>> h[0]
Partition of a set of 985931 objects. Total size = 46300932 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 985931 100 46300932 100  46300932 100 str

Ok, that looks pretty useless, but it really is not. The "sets" heapy gives you (like "h" and "h[0]") are a bunch of objects, grouped together by an "equivalence relation". The default one (with the crazy name "Clodo" for "Class or dict owner") groups together all objects of the same class and dicts with the same owner. We can also partition the sets by a different equivalence relation. Let's do a silly example first:

>>> h.bytype
Partition of a set of 1449133 objects. Total size = 102766644 bytes.
 Index  Count   %     Size   % Cumulative  % Type
     0 985931  68 46300932  45  46300932  45 str
     1  85556   6 45226592  44  91527524  89 dict
     2 115974   8  3776948   4  95304472  93 tuple
     3 152181  11  3043616   3  98348088  96 long
     4  36009   2  1584396   2  99932484  97 weakref.KeyedRef
     5  24702   2   889272   1 100821756  98 types.MethodType
     6  11424   1   851840   1 101673596  99 list
     7  24681   2   691068   1 102364664 100 pkgcore.ebuild.ebuild_src.package
     8  11328   1   317184   0 102681848 100 pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace
     9    408   0    26112   0 102707960 100 types.CodeType
<32 more rows. Type e.g. '_.more' to view.>

As you can see this is the same thing as the default view, but with all the dicts lumped together. A more useful one is:

>>> h.byrcs
Partition of a set of 1449133 objects. Total size = 102766644 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0 870779  60 43608088  42  43608088  42 dict (no owner)
     1  24681   2 22311624  22  65919712  64 pkgcore.ebuild.ebuild_src.package
     2 221936  15 20575932  20  86495644  84 dict of pkgcore.ebuild.ebuild_src.package
     3 242236  17  8588560   8  95084204  93 tuple
     4      6   0  1966736   2  97050940  94 dict of weakref.WeakValueDictionary
     5  36009   2  1773024   2  98823964  96 dict (no owner), dict of
                                             pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef
     6  11328   1  1540608   1 100364572  98 pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace
     7  26483   2   800432   1 101165004  98 list
     8  11328   1   724992   1 101889996  99 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace
     9      3   0   393444   0 102283440 100 dict of pkgcore.repository.prototype.IterValLazyDict
<132 more rows. Type e.g. '_.more' to view.>

What this does is:

So now we know that we have a lot of objects referenced only by one or more dicts (still not very useful) and also a lot of them referenced by one "normal" dict, referenced by the dict of (meaning "an attribute of") ebuild_src.package, and referenced by a WeakRef. Hmm, I wonder what those are. But let's store this view of the data first, since it took a while to generate ("_" is a feature of the python interpreter, it's always the last result):

>>> byrcs = _
>>> byrcs[5]
Partition of a set of 36009 objects. Total size = 1773024 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0  36009 100  1773024 100   1773024 100 dict (no owner), dict of
                                             pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef

Erm, yes, we knew that already. If you look in the top right of the table you can see it is still grouping the items by the kind of their referrer, which is not very useful here. To get more information we can change what they are grouped by:

>>> byrcs[5].byclodo
Partition of a set of 36009 objects. Total size = 1773024 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  36009 100  1773024 100   1773024 100 str
>>> byrcs[5].bysize
Partition of a set of 36009 objects. Total size = 1773024 bytes.
 Index  Count   %     Size   % Cumulative  % Individual Size
     0  10190  28   489120  28    489120  28        48
     1   7584  21   394368  22    883488  50        52
     2   7335  20   322740  18   1206228  68        44
     3   3947  11   221032  12   1427260  80        56
     4   3364   9   134560   8   1561820  88        40
     5   1903   5   114180   6   1676000  95        60
     6    877   2    56128   3   1732128  98        64
     7    285   1    19380   1   1751508  99        68
     8    451   1    16236   1   1767744 100        36
     9     57   0     4104   0   1771848 100        72

This took the set of objects with that odd set of referrers and redisplayed them grouped by "clodo". So now we know they're all strings. Most of them are pretty small too. To get some idea of what we're dealing with we can pull some random examples out:

>>> byrcs[5].byid
Set of 36009 <str> objects. Total size = 1773024 bytes.
 Index     Size   %   Cumulative  %   Representation (limited)
     0       80   0.0        80   0.0 'media-plugin...re20051219-r1'
     1       76   0.0       156   0.0 'app-emulatio...4.20041102-r1'
     2       76   0.0       232   0.0 'dev-php5/ezc...hemaTiein-1.0'
     3       76   0.0       308   0.0 'games-misc/f...wski-20030120'
     4       76   0.0       384   0.0 'mail-client/...pt-viewer-0.8'
     5       76   0.0       460   0.0 'media-fonts/...-100dpi-1.0.0'
     6       76   0.0       536   0.0 'media-plugin...gdemux-0.10.4'
     7       76   0.0       612   0.0 'media-plugin...3_pre20051219'
     8       76   0.0       688   0.0 'media-plugin...3_pre20051219'
     9       76   0.0       764   0.0 'media-plugin...3_pre20060502'
>>> byrcs[5].byid[0].theone
'media-plugins/vdr-streamdev-server-0.3.3_pre20051219-r1'

A pattern emerges! (sets with one item have a "theone" attribute with the actual item, all sets have a "nodes" attribute that returns an iterator yielding the items).

We could have used another heapy trick to get a better idea of what kind of string this was:

>>> byrcs[5].byvia
Partition of a set of 36009 objects. Total size = 1773024 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0      1   0       80   0        80   0 "['cpvstr']", '.key', '.keys()[23147]'
     1      1   0       76   0       156   0 "['cpvstr']", '.key', '.keys()[12285]'
     2      1   0       76   0       232   0 "['cpvstr']", '.key', '.keys()[12286]'
     3      1   0       76   0       308   0 "['cpvstr']", '.key', '.keys()[16327]'
     4      1   0       76   0       384   0 "['cpvstr']", '.key', '.keys()[17754]'
     5      1   0       76   0       460   0 "['cpvstr']", '.key', '.keys()[19079]'
     6      1   0       76   0       536   0 "['cpvstr']", '.key', '.keys()[21704]'
     7      1   0       76   0       612   0 "['cpvstr']", '.key', '.keys()[23473]'
     8      1   0       76   0       688   0 "['cpvstr']", '.key', '.keys()[24239]'
     9      1   0       76   0       764   0 "['cpvstr']", '.key', '.keys()[3070]'
<35999 more rows. Type e.g. '_.more' to view.>

Ouch, 36009 total rows for 36009 objects. What this did is similar to what "byrcs" did: for every object in the set it determined how they can be reached through their referrers, then groups objects that can be reached in the same ways together. Unfortunately it is grouping everything reachable as a dictionary key differently, so this is not very useful.

XXX WTF XXX

It is not likely this accomplishes anything, but let's assume we want to know if there are any objects in this set not reachable as the "key" attribute. Heapy can tell us (although this is very slow... there might be a better way but I do not know it yet):

>>> nonkeys = byrcs[5] & hp.Via('.key').alt('<')
>>> nonkeys.byrcs
hp.Nothing

(remember "hp" was our main entrance into heapy, the object that gave us the set of all objects we're interested in earlier).

What does this do? "hp.Via('.key')" creates a "symbolic set" of "all objects reachable only as the 'key' attribute of something" (it's a "symbolic set" because there are no actual objects in it). The "alt" method gives us a new symbolic set of everything reachable via "less than" this way. We then intersect this with our set and discover there is nothing left.

A similar construct that does not do what we want is:

>>> nonkeys = byrcs[5] & ~hp.Via('.key')

The "~" operator inverts the symbolic set, giving a set matching everything not reachable exactly as a "key" attribute. The key word here is "exactly": since everything in our set was also reachable in two other ways this intersection matches everything.

Ok, let's get back to the stuff actually eating memory:

>>> h[0].byrcs
Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
    0 670791  68 31716096  68  31716096  68 dict (no owner)
    1 139232  14  6525856  14  38241952  83 tuple
    2 136558  14  6042408  13  44284360  96 dict of pkgcore.ebuild.ebuild_src.package
    3  36009   4  1773024   4  46057384  99 dict (no owner), dict of
                                            pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef
    4   1762   0   107772   0  46165156 100 list
    5    824   0    69476   0  46234632 100 types.CodeType
    6    140   0    31312   0  46265944 100 function, tuple
    7    194   0    11504   0  46277448 100 dict of module
    8     30   0     6284   0  46283732 100 dict of type
    9     55   0     1972   0  46285704 100 dict of module, tuple

Remember h[0] gave us all str objects, so this is all string objects grouped by the kind(s) of their referrers. Also notice index 3 here is the same set of stuff we saw earlier:

>>> h[0].byrcs[3] ^ byrcs[5]
hp.Nothing

Most operators do what you would expect, & intersects for example.

"We have a lot of strings in dicts" is not that useful either, let's see if we can narrow that down a little:

>>> h[0].byrcs[0].referrers.byrcs
Partition of a set of 44124 objects. Total size = 18636768 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0  24681  56 12834120  69  12834120  69 dict of pkgcore.ebuild.ebuild_src.package
     1  19426  44  5371024  29  18205144  98 dict (no owner)
     2      1   0   393352   2  18598496 100 dict of pkgcore.repository.prototype.IterValLazyDict
     3      1   0     6280   0  18604776 100 __builtin__.set
     4      1   0     6280   0  18611056 100 dict of module, guppy.heapy.heapyc.RootStateType
     5      1   0     6280   0  18617336 100 dict of pkgcore.ebuild.eclass_cache.cache
     6      1   0     6280   0  18623616 100 dict of
                                             pkgcore.repository.prototype.PackageIterValLazyDict
     7      4   0     5536   0  18629152 100 type
     8      4   0     3616   0  18632768 100 dict of type
     9      1   0     1672   0  18634440 100 dict of module, dict of os._Environ

(Broken down: h[0].byrcs[0] is the set of all str objects referenced only by dicts, h[0].byrcs[0].referrers is the set of those dicts, and the final .byrcs displays those dicts grouped by their referrers)

Keep an eye on the size column. We have over 12M worth of just dicts (not counting the stuff in them) referenced only as attribute of ebuild_src.package. If we include the stuff kept alive by those dicts we're talking about a big chunk of the 100MB total here:

>>> t = _
>>> t[0].domisize
61269552

60M out of our 100M would be deallocated if we killed those dicts. So let's ask heapy what dicts that are:

>>> t[0].byvia
Partition of a set of 24681 objects. Total size = 12834120 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0  24681 100 12834120 100  12834120 100 "['data']"

(it is easy to get confused by the "byrcs" view of our "t". t[0] is not a bunch of "dict of ebuild_src.package". It is a bunch of dicts with strings in them, namely those that are referred to by the dict of ebuild_src.package, and not by anything else. So the byvia output means those dicts with strings in them are all "data" attributes of ebuild_src.package instances).

(sidenote: earlier we saw byvia say ".key", now it says "['data']". It's different because the previous type used __slots__ (so there was no "dict of" involved) and this type does not (so there is a "dict of" and our dicts are the "data" key in it).

So what is in the dicts:

>>> t[0].referents
Partition of a set of 605577 objects. Total size = 34289392 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 556215  92 27710068  81  27710068  81 str
     1  24681   4  6085704  18  33795772  99 dict (no owner)
     2  24681   4   493620   1  34289392 100 long
>>> _.byvia
Partition of a set of 605577 objects. Total size = 34289392 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0  24681   4  6085704  18   6085704  18 "['_eclasses_']"
     1  21954   4  3742976  11   9828680  29 "['DEPEND']"
     2  22511   4  3300052  10  13128732  38 "['RDEPEND']"
     3  24202   4  2631304   8  15760036  46 "['SRC_URI']"
     4  24681   4  1831668   5  17591704  51 "['DESCRIPTION']"
     5  24674   4  1476680   4  19068384  56 "['HOMEPAGE']"
     6  24681   4  1297680   4  20366064  59 "['KEYWORDS']"
     7  24681   4   888516   3  21254580  62 '.keys()[3]'
     8  24681   4   888516   3  22143096  65 '.keys()[9]'
     9  24681   4   810108   2  22953204  67 "['LICENSE']"
<32 more rows. Type e.g. '_.more' to view.>

Strings, nested dicts and longs, and most size eaten up by the "_eclasses_" values. There is also a significant amount eaten up by keys values, which is a bit odd, so let's investigate:

>>> refs = t[0].referents
>>> i=iter(refs.byvia[7].nodes)
>>> i.next()
'DESCRIPTION'
>>> i.next()
'DESCRIPTION'
>>> i.next()
'DESCRIPTION'
>>> i.next()
'DESCRIPTION'
>>> i.next()
'DESCRIPTION'

Eep!

>>> refs.byvia[7].bysize
Partition of a set of 24681 objects. Total size = 888516 bytes.
 Index  Count   %     Size   % Cumulative  % Individual Size
     0  24681 100   888516 100    888516 100        36

It looks like we have 24681 identical strings here, using up about 1M of memory. The other odd entry is the '_eclasses_' string apparently.

Extra stuff for c extension developers

To provide accurate statistics if your code uses extension types you must provide heapy with a way to get the following data for your custom types:

You provide these through a NyHeapDef struct, defined in heapdef.h in the guppy source. This header is not installed, so you should just copy it into your source tree. It is a good idea to read this header file side by side with the following descriptions, since it contains details omitted here. The stdtypes.c file contains implementations for the basic python types which you can read for inspiration.

The NyHeapDef struct provides heapy with three function pointers:

SizeGetter

To answer "how large is an instance" you provide a NyHeapDef_SizeGetter function that is called with a PyObject* and returns an int: the number of bytes the object occupies. If you do not provide this function heapy uses a default that looks at the tp_basicsize and tp_itemsize fields of the type. This means that if you do not allocate any extra memory for non-python objects (e.g. for c strings) you do not need to provide this function.

Traverser

To answer "What objects does an instance contain" you provide a traversal function (NyHeapDef_Traverser). This is called with a pointer to a "visit procedure", an instance of your extension type and some other stuff. You should then call the visit procedure for every python object contained in your object.

This might sound familiar: to support the python garbage collector you provide a very similar function (tp_traverse). Actually heapy will use tp_traverse if you do not provide a heapy-specific traverse function. Doing this makes sense if you do not support the garbage collector for some reason, or if you contain objects that are irrelevant to the garbage collector.

An example would be a type that contains a single python string object (that no other code can get a reference to). If this object does not have references to other python objects it cannot be involved in cycles so supporting gc would be useless. However you do still want heapy to know about the memory occupied by the contained string. You could do that by adding that size in your NyHeapDef_SizeGetter function but it is probably easier to tell heapy about the string through the traversal function (so you do not have to calculate the memory occupied by the string).

If the above type would also have a reference to some arbitrary (non-private) python object it should support gc, but it does not need to tell gc about the contained string. So you would have two traversal functions, one for heapy that visits the string and one for gc that does not.

RelationGetter

The last function heapy wants tells it in what way your instance refers to some contained object. It is used to provide the "byvia" view. This calls a visit function once for each way your instance refers to a target object, telling it what kind of reference it is.

Providing the heapdef struct to heapy

Once you have the needed function pointers in a struct you need to pass this to heapy somehow. This is done through a standard cpython mechanism called "cobjects". From python these look like rather stupid objects you cannot do anything with, but from c you can pull out a void* that was put in when the object was constructed. You can wrap an arbitrary pointer in a CObject, make it available as attribute of your module, then import it from some other module, pull the void* back out and cast it to the original type.

heapy looks for a _NyHeapDefs_ attribute on all loaded modules. If this attribute exists and is a CObject the pointer in it is used as a pointer to an array of NyHeapDef struct (terminated with a struct with only nulls). Example code doing this is in sets.c in the guppy source.