Monday, April 14, 2014

The logging overhead.

Python makes printing logger messages, and adjusting the logger level (which messages to print when) very easy. However, it seems, that the logger code comes with a higher overhead than if you used simple 'if' statements to control what messages to print.

Logger messages are very, very useful. Two typical uses of logger messages is to debug things when a program goes awry and to print out the progress of a program (if the user wishes it) to reassure us that the program is running and inform us of how much further the program has to go.

Python's logger is a lot of fun because it is very easy to set up logging messages at different levels and then filter which messages you will actually show. For example you can code the program to print out values of some variables at the  'DEBUG' level and print out the progress of the program at the 'INFO' level. Then you can instruct the code to print out only the INFO messages or both the DEBUG and INFO messages.

It's very likely that actually printing logging messages slows down the program, but I began to wonder what kind of overhead even the ignored logger messages were introducing to the program. One could think that placing a debug logger message in a loop is a good debug tool which can be conveniently switched off when in production with little effect, but how true is that?


import logging
logger = logging.getLogger("Meta")


def foo(N):
  sum = 0
  for n in range(N):
    sum += n
    logger.debug('This is message {:d}'.format(n))
  x = sum**2

def bar(N):
  sum = 0
  for n in range(N):
    sum += n
  x = sum**2

def foobar(N, flag=False):
  sum = 0
  for n in range(N):
    sum += n
    if flag:
      logger.debug('This is message {:d}'.format(n))
  x = sum**2


if __name__ == "__main__":
  import timeit
  print 'No logger message: ', timeit.timeit('bar(100)', setup='from __main__ import logging, bar; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Suppressed logger message: ', timeit.timeit('foo(100)', setup='from __main__ import logging, foo; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Logger message behind false if statement: ', timeit.timeit('foobar(100)', setup='from __main__ import logging, foobar; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Logger message behind true if statement: ', timeit.timeit('foobar(100, True)', setup='from __main__ import logging, foobar; logging.basicConfig(level=logging.ERROR)', number=1000)


Which gives us

No logger message:  0.00471091270447
Suppressed logger message:  0.140299081802
Logger message behind false if statement:  0.00491690635681
Logger message behind true if statement:  0.133023023605

This is interesting. It shows that the overhead for the suppressed logging statement is very high, and is in fact higher than that for a simple if statement - that is if you had code where you suppressed printing of a log message using an if statement you would waste less time than if you relied on logger's own logic to handle log levels.

This is a little disappointing.

UPDATE: Following Anon's suggestion to use
logger.debug('This is message %d', n)
I found that it did indeed reduce the logger overhead, but not by much. The new times are:

No logger message:  0.0054669380188
Suppressed logger message:  0.0824329853058
Logger message behind false if statement:  0.00509595870972
Logger message behind true if statement:  0.0940999984741

3 comments:

  1. The problem is that you call the format method even in the suppressed log case. Avoid formatting by using the "%s" form and let the logging library do the substitution for you.

    e.g

    logger.debug('This is message %d', n)

    This also has the added benefit of not crashing if the substitution fails for some reason as the logging library will catch it for you.

    ReplyDelete
    Replies
    1. Hey, thanks! I tried it out and indeed this saves some time, but there is still a large overhead. The update is in the post.

      Delete
  2. With that change applied I suspect you are now looking largely at Python attribute look up and function call overhead. It shouldn't be too much surprise that this would be much more expensive than simple integer addition. Try testing a dummy method on a dummy class and see how that compares. I think you will find two things:

    1) Python function call overhead is large
    2) Python attribute lookup isn't completely free

    ReplyDelete