Skip to main content

The logging overhead.

Python makes printing logger messages, and adjusting the logger level (which messages to print when) very easy. However, it seems, that the logger code comes with a higher overhead than if you used simple 'if' statements to control what messages to print.

Logger messages are very, very useful. Two typical uses of logger messages is to debug things when a program goes awry and to print out the progress of a program (if the user wishes it) to reassure us that the program is running and inform us of how much further the program has to go.

Python's logger is a lot of fun because it is very easy to set up logging messages at different levels and then filter which messages you will actually show. For example you can code the program to print out values of some variables at the  'DEBUG' level and print out the progress of the program at the 'INFO' level. Then you can instruct the code to print out only the INFO messages or both the DEBUG and INFO messages.

It's very likely that actually printing logging messages slows down the program, but I began to wonder what kind of overhead even the ignored logger messages were introducing to the program. One could think that placing a debug logger message in a loop is a good debug tool which can be conveniently switched off when in production with little effect, but how true is that?


import logging
logger = logging.getLogger("Meta")


def foo(N):
  sum = 0
  for n in range(N):
    sum += n
    logger.debug('This is message {:d}'.format(n))
  x = sum**2

def bar(N):
  sum = 0
  for n in range(N):
    sum += n
  x = sum**2

def foobar(N, flag=False):
  sum = 0
  for n in range(N):
    sum += n
    if flag:
      logger.debug('This is message {:d}'.format(n))
  x = sum**2


if __name__ == "__main__":
  import timeit
  print 'No logger message: ', timeit.timeit('bar(100)', setup='from __main__ import logging, bar; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Suppressed logger message: ', timeit.timeit('foo(100)', setup='from __main__ import logging, foo; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Logger message behind false if statement: ', timeit.timeit('foobar(100)', setup='from __main__ import logging, foobar; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Logger message behind true if statement: ', timeit.timeit('foobar(100, True)', setup='from __main__ import logging, foobar; logging.basicConfig(level=logging.ERROR)', number=1000)


Which gives us

No logger message:  0.00471091270447
Suppressed logger message:  0.140299081802
Logger message behind false if statement:  0.00491690635681
Logger message behind true if statement:  0.133023023605

This is interesting. It shows that the overhead for the suppressed logging statement is very high, and is in fact higher than that for a simple if statement - that is if you had code where you suppressed printing of a log message using an if statement you would waste less time than if you relied on logger's own logic to handle log levels.

This is a little disappointing.

UPDATE: Following Anon's suggestion to use
logger.debug('This is message %d', n)
I found that it did indeed reduce the logger overhead, but not by much. The new times are:

No logger message:  0.0054669380188
Suppressed logger message:  0.0824329853058
Logger message behind false if statement:  0.00509595870972
Logger message behind true if statement:  0.0940999984741

Comments

  1. The problem is that you call the format method even in the suppressed log case. Avoid formatting by using the "%s" form and let the logging library do the substitution for you.

    e.g

    logger.debug('This is message %d', n)

    This also has the added benefit of not crashing if the substitution fails for some reason as the logging library will catch it for you.

    ReplyDelete
    Replies
    1. Hey, thanks! I tried it out and indeed this saves some time, but there is still a large overhead. The update is in the post.

      Delete
  2. With that change applied I suspect you are now looking largely at Python attribute look up and function call overhead. It shouldn't be too much surprise that this would be much more expensive than simple integer addition. Try testing a dummy method on a dummy class and see how that compares. I think you will find two things:

    1) Python function call overhead is large
    2) Python attribute lookup isn't completely free

    ReplyDelete

Post a Comment

Popular posts from this blog

A note on Python's __exit__() and errors

Python's context managers are a very neat way of handling code that needs a teardown once you are done. Python objects have do have a destructor method ( __del__ ) called right before the last instance of the object is about to be destroyed. You can do a teardown there. However there is a lot of fine print to the __del__ method. A cleaner way of doing tear-downs is through Python's context manager , manifested as the with keyword. class CrushMe: def __init__(self): self.f = open('test.txt', 'w') def foo(self, a, b): self.f.write(str(a - b)) def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.f.close() return True with CrushMe() as c: c.foo(2, 3) One thing that is important, and that got me just now, is error handling. I made the mistake of ignoring all those 'junk' arguments ( exc_type, exc_val, exc_tb ). I just skimmed the docs and what popped out is that you need to return True or...

Store numpy arrays in sqlite

Use numpy.getbuffer (or sqlite3.Binary ) in combination with numpy.frombuffer to lug numpy data in and out of the sqlite3 database: import sqlite3, numpy r1d = numpy.random.randn(10) con = sqlite3.connect(':memory:') con.execute("CREATE TABLE eye(id INTEGER PRIMARY KEY, desc TEXT, data BLOB)") con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", sqlite3.Binary(r1d))) con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", numpy.getbuffer(r1d))) res = con.execute("SELECT * FROM eye").fetchall() con.close() #res -> #[(1, u'1d', <read-write buffer ptr 0x10371b220, size 80 at 0x10371b1e0>), # (2, u'1d', <read-write buffer ptr 0x10371b190, size 80 at 0x10371b150>)] print r1d - numpy.frombuffer(res[0][2]) #->[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] print r1d - numpy.frombuffer(res[1][2]) #->[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Note that for work where data ty...