Skip to main content

Use numpydoc + sphinx

An ideal code documentation system should allow you to write documentation once (when you are writing the code) and then allow you to display the documentation in different contexts, such as project manuals, inline help, command line help and so on. You shouldn't need to duplicate docs - it is a waste of effort and leads to errors when the code is updated. The documentation you write should be easily readable by human users as they peruse the code, as well as by users as they run your code.

Sphinx is an awesome documentation system for Python. Sphinx can take descriptions in docstrings and embed them in documentation so that we can approach the goals of an ideal documentation system.

However, Sphinx violates the human readability part when it comes to function parameter descriptions. Take the following function, for example.


def _repeat_sequence(seq_len=100, subseq=None, subseq_len=10, base_sel_rng=None, alphabet=['A', 'C', 'T', 'G']):
"""
Create a sequence by repeating a sub-sequence
"""
subseq = base_sel_rng.choice(alphabet, size=subseq_len, replace=True, p=[.3, .2, .2, .3]).tostring()
return subseq * (seq_len / subseq_len) + subseq[:seq_len % subseq_len]


The Sphinx-parsable way of writing the docstring is:

def _repeat_sequence(seq_len=100, subseq=None, subseq_len=10, base_sel_rng=None, alphabet=['A', 'C', 'T', 'G']):
  """Create a sequence by repeating a sub-sequence

  :param seq_len: Length of sequence.
  :type seq_len: int
  :param subseq: Sub-sequence to use as repeat block. Omit to generate a random sub-sequence.
  :type subseq: str
  :param subseq_len: If subseq is omitted this must be provided to indicate desired length of random sub-sequence
  :type subseq_len: Length of random sub-sequence
  :param base_sel_rng: Random number generator e.g. numpy.random
  :param alphabet: List of characters constituting the alphabet
  :type alphabet: list

  :returns:  str -- the sequence.
  """
  subseq = base_sel_rng.choice(alphabet, size=subseq_len, replace=True, p=[.3, .2, .2, .3]).tostring()
  return subseq * (seq_len / subseq_len) + subseq[:seq_len % subseq_len]

It does not matter that the Sphinx output is well formatted: for humans it's rather yucky to look at which defeats the purpose of docstrings.

It turns out that, as is common in Python, this problem has been noted and rectified. In this case the correction comes from the great folks at numpy who have a nice extension called numpydoc. The numpydoc version of this docstring is:

def repeat_sequence(seq_len=100, subseq=None, subseq_len=10, base_sel_rng=None, alphabet=['A', 'C', 'T', 'G']):
  """Create a sequence by repeating a sub-sequence

  Parameters
  ----------
  seq_len : int
                              Length of sequence.
  subseq : str, optional
                              Sub-sequence to use as repeat block. Omit to generate a random sub-sequence.
  subseq_len : int, optional
                              If subseq is omitted this must be provided to indicate desired length of random
                              sub-sequence
  subseq_len : int
                              Length of random sub-sequence
  base_sel_rng : object
                              Random number generator e.g. numpy.random
  alphabet : list, optional
                              List of characters constituting the alphabet

  Returns
  -------
  str
      The sequence.

  .. note:: This is meant to be used internally

  """
  subseq = base_sel_rng.choice(alphabet, size=subseq_len, replace=True, p=[.3, .2, .2, .3]).tostring()
  return subseq * (seq_len / subseq_len) + subseq[:seq_len % subseq_len]

Note: The only glitchy thing is that when you add numpydoc to the list of extensions in Sphinx's conf.py it can't be ahead of standard sphinx extensions - the order matters. For example, my extensions list is:

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.doctest',
    'sphinx.ext.todo',
    'sphinx.ext.coverage',
    'sphinx.ext.pngmath',
    'sphinx.ext.mathjax',
    'sphinx.ext.viewcode',
    'numpydoc.numpydoc'
]

Putting numpydoc first leads to the error:
Extension error:
Unknown event name: autodoc-process-docstring

Comments

  1. Perhaps coming late here, but the solution (with Sphinx 1.3.5) is :

    extensions = [..., 'sphinx.ext.napoleon']

    Now you have additional options :

    http://www.sphinx-doc.org/en/stable/ext/napoleon.html

    ReplyDelete

Post a Comment

Popular posts from this blog

A note on Python's __exit__() and errors

Python's context managers are a very neat way of handling code that needs a teardown once you are done. Python objects have do have a destructor method ( __del__ ) called right before the last instance of the object is about to be destroyed. You can do a teardown there. However there is a lot of fine print to the __del__ method. A cleaner way of doing tear-downs is through Python's context manager , manifested as the with keyword. class CrushMe: def __init__(self): self.f = open('test.txt', 'w') def foo(self, a, b): self.f.write(str(a - b)) def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.f.close() return True with CrushMe() as c: c.foo(2, 3) One thing that is important, and that got me just now, is error handling. I made the mistake of ignoring all those 'junk' arguments ( exc_type, exc_val, exc_tb ). I just skimmed the docs and what popped out is that you need to return True or...

Store numpy arrays in sqlite

Use numpy.getbuffer (or sqlite3.Binary ) in combination with numpy.frombuffer to lug numpy data in and out of the sqlite3 database: import sqlite3, numpy r1d = numpy.random.randn(10) con = sqlite3.connect(':memory:') con.execute("CREATE TABLE eye(id INTEGER PRIMARY KEY, desc TEXT, data BLOB)") con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", sqlite3.Binary(r1d))) con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", numpy.getbuffer(r1d))) res = con.execute("SELECT * FROM eye").fetchall() con.close() #res -> #[(1, u'1d', <read-write buffer ptr 0x10371b220, size 80 at 0x10371b1e0>), # (2, u'1d', <read-write buffer ptr 0x10371b190, size 80 at 0x10371b150>)] print r1d - numpy.frombuffer(res[0][2]) #->[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] print r1d - numpy.frombuffer(res[1][2]) #->[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Note that for work where data ty...