Skip to main content

The logging overhead.

Python makes printing logger messages, and adjusting the logger level (which messages to print when) very easy. However, it seems, that the logger code comes with a higher overhead than if you used simple 'if' statements to control what messages to print.

Logger messages are very, very useful. Two typical uses of logger messages is to debug things when a program goes awry and to print out the progress of a program (if the user wishes it) to reassure us that the program is running and inform us of how much further the program has to go.

Python's logger is a lot of fun because it is very easy to set up logging messages at different levels and then filter which messages you will actually show. For example you can code the program to print out values of some variables at the  'DEBUG' level and print out the progress of the program at the 'INFO' level. Then you can instruct the code to print out only the INFO messages or both the DEBUG and INFO messages.

It's very likely that actually printing logging messages slows down the program, but I began to wonder what kind of overhead even the ignored logger messages were introducing to the program. One could think that placing a debug logger message in a loop is a good debug tool which can be conveniently switched off when in production with little effect, but how true is that?


import logging
logger = logging.getLogger("Meta")


def foo(N):
  sum = 0
  for n in range(N):
    sum += n
    logger.debug('This is message {:d}'.format(n))
  x = sum**2

def bar(N):
  sum = 0
  for n in range(N):
    sum += n
  x = sum**2

def foobar(N, flag=False):
  sum = 0
  for n in range(N):
    sum += n
    if flag:
      logger.debug('This is message {:d}'.format(n))
  x = sum**2


if __name__ == "__main__":
  import timeit
  print 'No logger message: ', timeit.timeit('bar(100)', setup='from __main__ import logging, bar; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Suppressed logger message: ', timeit.timeit('foo(100)', setup='from __main__ import logging, foo; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Logger message behind false if statement: ', timeit.timeit('foobar(100)', setup='from __main__ import logging, foobar; logging.basicConfig(level=logging.ERROR)', number=1000)
  print 'Logger message behind true if statement: ', timeit.timeit('foobar(100, True)', setup='from __main__ import logging, foobar; logging.basicConfig(level=logging.ERROR)', number=1000)


Which gives us

No logger message:  0.00471091270447
Suppressed logger message:  0.140299081802
Logger message behind false if statement:  0.00491690635681
Logger message behind true if statement:  0.133023023605

This is interesting. It shows that the overhead for the suppressed logging statement is very high, and is in fact higher than that for a simple if statement - that is if you had code where you suppressed printing of a log message using an if statement you would waste less time than if you relied on logger's own logic to handle log levels.

This is a little disappointing.

UPDATE: Following Anon's suggestion to use
logger.debug('This is message %d', n)
I found that it did indeed reduce the logger overhead, but not by much. The new times are:

No logger message:  0.0054669380188
Suppressed logger message:  0.0824329853058
Logger message behind false if statement:  0.00509595870972
Logger message behind true if statement:  0.0940999984741

Comments

  1. The problem is that you call the format method even in the suppressed log case. Avoid formatting by using the "%s" form and let the logging library do the substitution for you.

    e.g

    logger.debug('This is message %d', n)

    This also has the added benefit of not crashing if the substitution fails for some reason as the logging library will catch it for you.

    ReplyDelete
    Replies
    1. Hey, thanks! I tried it out and indeed this saves some time, but there is still a large overhead. The update is in the post.

      Delete
  2. With that change applied I suspect you are now looking largely at Python attribute look up and function call overhead. It shouldn't be too much surprise that this would be much more expensive than simple integer addition. Try testing a dummy method on a dummy class and see how that compares. I think you will find two things:

    1) Python function call overhead is large
    2) Python attribute lookup isn't completely free

    ReplyDelete

Post a Comment

Popular posts from this blog

Flowing text in inkscape (Poster making)

You can flow text into arbitrary shapes in inkscape. (From a hint here).

You simply create a text box, type your text into it, create a frame with some drawing tool, select both the text box and the frame (click and shift) and then go to text->flow into frame.

UPDATE:

The omnipresent anonymous asked:
Trying to enter sentence so that text forms the number three...any ideas?
The solution:
Type '3' using the text toolConvert to path using object->pathSize as necessaryRemove fillUngroupType in actual text in new text boxSelect the text and the '3' pathFlow the text

Drawing circles using matplotlib

Use the pylab.Circle command

import pylab #Imports matplotlib and a host of other useful modules cir1 = pylab.Circle((0,0), radius=0.75, fc='y') #Creates a patch that looks like a circle (fc= face color) cir2 = pylab.Circle((.5,.5), radius=0.25, alpha =.2, fc='b') #Repeat (alpha=.2 means make it very translucent) ax = pylab.axes(aspect=1) #Creates empty axes (aspect=1 means scale things so that circles look like circles) ax.add_patch(cir1) #Grab the current axes, add the patch to it ax.add_patch(cir2) #Repeat pylab.show()

Pandas panel = collection of tables/data frames aligned by index and column

Pandas panel provides a nice way to collect related data frames together while maintaining correspondence between the index and column values:


import pandas as pd, pylab #Full dimensions of a slice of our panel index = ['1','2','3','4'] #major_index columns = ['a','b','c'] #minor_index df = pd.DataFrame(pylab.randn(4,3),columns=columns,index=index) #A full slice of the panel df2 = pd.DataFrame(pylab.randn(3,2),columns=['a','c'],index=['1','3','4']) #A partial slice df3 = pd.DataFrame(pylab.randn(2,2),columns=['a','b'],index=['2','4']) #Another partial slice df4 = pd.DataFrame(pylab.randn(2,2),columns=['d','e'],index=['5','6']) #Partial slice with a new column and index pn = pd.Panel({'A': df}) pn['B'] = df2 pn['C'] = df3 pn['D'] = df4 for key in pn.items: print pn[key] -> output …