Debugging Django memory leak with TrackRefs and Guppy

Posted on March 7, 2008  by Mikko Ohtamaa
Filed Under django, python

I run Django in a standalone long-running application (video encoding server). It leaked memory severely. By using htop, one was seeing two gigabytes reserved for /usr/bin/python after a while. Before starting the debugging session, I had no faintest idea what could be the cause of the problem. Django is robust technology - this kind of things haven’t happened for me before. Since I was running Django in standalone mode, I suspected that some query cache does not get cleared. But random poking around the source code didn’t give any clues.

It was time to do some serious memory debugging for Python.

Python as is doesn’t leak memory, since it’s garbage collected virtual machine. All “leaks” are design problems in the application logic.I found a good primer here what’s going inside Python’s memory management.

First I tried this nice TrackRefs class from Zope. It relies on Python’s own in-interpreter functions to monitor objects.

class TrackRefs:
    """Object to track reference counts across test runs."""

    def __init__(self, limit=40):
        self.type2count = {}
        self.type2all = {}
        self.limit = limit

    def update(self):
        obs = sys.getobjects(0)
        type2count = {}
        type2all = {}
        for o in obs:
            all = sys.getrefcount(o)

            if type(o) is str and o == '<dummy key>':
                # avoid dictionary madness
                continue
            t = type(o)
            if t in type2count:
                type2count[t] += 1
                type2all[t] += all
            else:
                type2count[t] = 1
                type2all[t] = all

        ct = [(type2count[t] - self.type2count.get(t, 0),
               type2all[t] - self.type2all.get(t, 0),
               t)
              for t in type2count.iterkeys()]
        ct.sort()
        ct.reverse()
        printed = False

        logger.debug("----------------------")
        logger.debug("Memory profiling")
        i = 0
        for delta1, delta2, t in ct:
            if delta1 or delta2:
                if not printed:
                    logger.debug("%-55s %8s %8s" % ('', 'insts', 'refs'))
                    printed = True

                logger.debug("%-55s %8d %8d" % (t, delta1, delta2))

                i += 1
                if i >= self.limit:
                    break 

        self.type2count = type2count
        self.type2all = type2all

You need to have Python compiled in debug mode to have sys.getobjects() method. Luckily this beefed up Python binary is availalble from Ubuntu’s stock repository:

sudo apt-get install python-dbg python-mysqldb-dbg

Note that native Python extensions don’t work unless they are specifically compiled against the Python debug build (python-mysqldb-dbg)..

Then I add TrackRefs to my main loop:

    def run(self):

        self.running = True

        logger.info("Started worker " + self.get_worker_id_string())

        # Memory leak tracking
        tracker = TrackRefs()

        while self.running:

            self.mark_for_download()

            self.process_downloads()

            self.process_encodings()

            tracker.update() # Dump memory here
            time.sleep(settings.WORKER_POLL_DELAY)

And after running a while I start getting interesting results:

7956 [2008-03-07 02:59:28,767] INFO Jobs needing sources to download 0
7956 [2008-03-07 02:59:28,768] DEBUG Processable jobs: 0
7956 [2008-03-07 02:59:29,754] DEBUG ----------------------
7956 [2008-03-07 02:59:29,754] DEBUG Memory profiling
7956 [2008-03-07 02:59:29,754] DEBUG                                                            insts     refs
7956 [2008-03-07 02:59:29,754] DEBUG <type 'int'>                                                 150   137406
7956 [2008-03-07 02:59:29,755] DEBUG <type 'tuple'>                                               117   130211
7956 [2008-03-07 02:59:29,755] DEBUG <type 'dict'>                                                  5     8331
7956 [2008-03-07 02:59:29,755] DEBUG <type 'str'>                                                   3    27643
7956 [2008-03-07 02:59:29,755] DEBUG <type 'unicode'>                                               3     4606
7956 [2008-03-07 02:59:29,755] DEBUG <type 'list'>                                                  3     3492
7956 [2008-03-07 02:59:29,756] DEBUG <type 'frame'>                                                 1      962
7956 [2008-03-07 02:59:29,756] DEBUG <type 'cell'>                                                  0    12948
7956 [2008-03-07 02:59:29,756] DEBUG <type 'function'>                                              0     9479

Woah! Who reserved 130 000 ints and tuples? No wonder that soon python gulps 1 gigabytes of memory. Since this is the only number which grows during the main loop cycling and there is no references to classes or objects debugging becomes a bit more difficult. I need to try to cross-reference the difficult tuple objects.

This didn’t go well - with gc.get_referrers() recurive parsing I got some results (example below). But it became soom clearthat debug references from the system itself was impossible: the memory debugging code will always create nasty cyclic references to the system, since it needs to track the objects. I gave up. There had to be something better.

9154 [2008-03-07 04:05:23,571] DEBUG /var/lib/python-support/python2.5/MySQLdb/connections.pyc
9154 [2008-03-07 04:05:23,571] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,572] DEBUG defaulterrorhandler
9154 [2008-03-07 04:05:23,572] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,572] DEBUG string_literal
9154 [2008-03-07 04:05:23,572] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,573] DEBUG unicode_literal
9154 [2008-03-07 04:05:23,573] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,573] DEBUG string_decoder
9154 [2008-03-07 04:05:23,573] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,573] DEBUG __exit__
9154 [2008-03-07 04:05:23,574] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,574] DEBUG begin
9154 [2008-03-07 04:05:23,574] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,574] DEBUG __init__
9154 [2008-03-07 04:05:23,574] DEBUG <type 'function'>
9154 [2008-03-07 04:05:23,575] DEBUG show_warnings

There was: Guppy. Thank you Sverker Nilsson! You saved my day.

Since the API of Guppy is a little eccentric, here are some examples for you:

# init heapy
heapy = guppy.hpy()

# Print memory statistics
def update():
  print heapy.heap()

# Print relative memory consumption since last sycle
def update():
   print heapy.heap()
   heapy.setref()

# Print relative memory consumption w/heap traversing
def update()
    print heapy.heap().get_rp(40)
    heapy.setref()

With heapy.heap() ; heapy.setref() I got this output:

Partition of a set of 12 objects. Total size = 3544 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      3  25     2244  63      2244  63 unicode
     1      2  17      708  20      2952  83 types.FrameType
     2      3  25      432  12      3384  95 dict (no owner)
     3      3  25      120   3      3504  99 str
     4      1   8       40   1      3544 100 list

One adds get_rp() travelsal magic and everything becomes clear:

Reference Pattern by <[dict of] class>.
 0: _ --- [-] 14 (dict (no owner) | list | str | types.FrameType | types.Gene...
 1: a      [-] 3 dict (no owner): 0x8c11f34*2, 0x8c1bd54*2, 0x8c1f854*2
 2: aa ---- [-] 1 list: 0x833c504*18
 3: a3       [-] 1 dict of django.db.backends.mysql.base.DatabaseWrapper: 0x8...
 4: a4 ------ [-] 1 dict (no owner): 0x83a65d4*2
 5: a5         [R] 1 guppy.heapy.heapyc.RootStateType: 0xb787c7a8L
 6: a3b ----- [-] 1 django.db.backends.mysql.base.DatabaseWrapper: 0x8356a34
 7: a3ba       [S] 7 dict of module: ..db, ..models, ..query, ..transaction...
 8: b ---- [S] 1 types.FrameType: <<lambda> at 0x8b16ecc>
 9: c      [-] 2 list: 0x833c504*18, 0xb7dafe6cL*5
<Type e.g. '_.more' for more.>

What there could in DatabaseWrapper object which is growing and growing… query debugger. Django keeps track of all queries for debugging purposes (connection.queries). This list is reseted at the end of HTTP request. But in standalone mode, there are no requests. So you need to manually reset to queries list after each working cycle.

        while self.running:

            self.mark_for_download()

            self.process_downloads()

            self.process_encodings()

            tracker.update()

            time.sleep(settings.WORKER_POLL_DELAY)

            # Clear database connection ad reset query debugger
            # between cycles to make sure that
            # related resources get released
            reset_queries()
            connection.close()

            print str(connection.queries)

But even after this fix, I got increase in tuple and int usage when monitoring with TrackRefs. But when I run heapy.heap() alone, there is no increase. So the tuple and int consumption must have been caused by TrackRef, sys.getobjects, gc, etc. magic itself.Share This

 

Other posts by Mikko Ohtamaa

 

Comments

7 Responses to “Debugging Django memory leak with TrackRefs and Guppy”

  1. eliott on March 7th, 2008 9:15 am
    Gravatar

    Isn’t this only an issue when you run the django server with settings.DEBUG=True?

     
  2. Mikko Ohtamaa on March 7th, 2008 3:30 pm
    Gravatar

    That’s probably true. It is still not much advertised feature and the fact that I was not aware of it, even though I have been reading my good share of Django manuals, is a sign that some others might hit this issue too.

    But you got a good tutorial to do Python memory debugging too! =)

     
  3. antisvin on March 7th, 2008 6:44 pm
    Gravatar

    People get burned quite often by this issue. Note that it is explicitly mentioned in official Django FAQ:

    http://www.djangoproject.com/documentation/faq/#why-is-django-leaking-memory

     
  4. Mikko Ohtamaa on March 7th, 2008 7:02 pm
    Gravatar

    Maybe a long term solution would be simply set a roof limit how many SQL queries Django stores.

     
  5. Mikko Ohtamaa on March 7th, 2008 7:11 pm
    Gravatar

    I added an issue tracker ticket:

    http://code.djangoproject.com/ticket/6734

     
  6. Confluence: Client: Mixin on December 16th, 2008 1:15 pm
    Gravatar

    performances…

    Performances Caching We use cmemcache…

     
  7. Confluence: Client: Mixin on December 16th, 2008 1:59 pm
    Gravatar

    Logging, Performances, Memory Leaks and Error management…

    We have done a few things to improve mixin performances. We have first setup a logging system #Logging to track down the bottlenecks and recurrent queries. We have added caching #Caching on some Models…….

     

Leave a Reply




Copyright © Red Innovation Ltd. 2009 All Rights Reserved. | Log in | XHTML
Close
E-mail It