Tuesday, July 22, 2014

Use numpydoc + sphinx

An ideal code documentation system should allow you to write documentation once (when you are writing the code) and then allow you to display the documentation in different contexts, such as project manuals, inline help, command line help and so on. You shouldn't need to duplicate docs - it is a waste of effort and leads to errors when the code is updated. The documentation you write should be easily readable by human users as they peruse the code, as well as by users as they run your code.

Sphinx is an awesome documentation system for Python. Sphinx can take descriptions in docstrings and embed them in documentation so that we can approach the goals of an ideal documentation system.

However, Sphinx violates the human readability part when it comes to function parameter descriptions. Take the following function, for example.


def _repeat_sequence(seq_len=100, subseq=None, subseq_len=10, base_sel_rng=None, alphabet=['A', 'C', 'T', 'G']):
"""
Create a sequence by repeating a sub-sequence
"""
subseq = base_sel_rng.choice(alphabet, size=subseq_len, replace=True, p=[.3, .2, .2, .3]).tostring()
return subseq * (seq_len / subseq_len) + subseq[:seq_len % subseq_len]


The Sphinx-parsable way of writing the docstring is:

def _repeat_sequence(seq_len=100, subseq=None, subseq_len=10, base_sel_rng=None, alphabet=['A', 'C', 'T', 'G']):
  """Create a sequence by repeating a sub-sequence

  :param seq_len: Length of sequence.
  :type seq_len: int
  :param subseq: Sub-sequence to use as repeat block. Omit to generate a random sub-sequence.
  :type subseq: str
  :param subseq_len: If subseq is omitted this must be provided to indicate desired length of random sub-sequence
  :type subseq_len: Length of random sub-sequence
  :param base_sel_rng: Random number generator e.g. numpy.random
  :param alphabet: List of characters constituting the alphabet
  :type alphabet: list

  :returns:  str -- the sequence.
  """
  subseq = base_sel_rng.choice(alphabet, size=subseq_len, replace=True, p=[.3, .2, .2, .3]).tostring()
  return subseq * (seq_len / subseq_len) + subseq[:seq_len % subseq_len]

It does not matter that the Sphinx output is well formatted: for humans it's rather yucky to look at which defeats the purpose of docstrings.

It turns out that, as is common in Python, this problem has been noted and rectified. In this case the correction comes from the great folks at numpy who have a nice extension called numpydoc. The numpydoc version of this docstring is:

def repeat_sequence(seq_len=100, subseq=None, subseq_len=10, base_sel_rng=None, alphabet=['A', 'C', 'T', 'G']):
  """Create a sequence by repeating a sub-sequence

  Parameters
  ----------
  seq_len : int
                              Length of sequence.
  subseq : str, optional
                              Sub-sequence to use as repeat block. Omit to generate a random sub-sequence.
  subseq_len : int, optional
                              If subseq is omitted this must be provided to indicate desired length of random
                              sub-sequence
  subseq_len : int
                              Length of random sub-sequence
  base_sel_rng : object
                              Random number generator e.g. numpy.random
  alphabet : list, optional
                              List of characters constituting the alphabet

  Returns
  -------
  str
      The sequence.

  .. note:: This is meant to be used internally

  """
  subseq = base_sel_rng.choice(alphabet, size=subseq_len, replace=True, p=[.3, .2, .2, .3]).tostring()
  return subseq * (seq_len / subseq_len) + subseq[:seq_len % subseq_len]

Note: The only glitchy thing is that when you add numpydoc to the list of extensions in Sphinx's conf.py it can't be ahead of standard sphinx extensions - the order matters. For example, my extensions list is:

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.doctest',
    'sphinx.ext.todo',
    'sphinx.ext.coverage',
    'sphinx.ext.pngmath',
    'sphinx.ext.mathjax',
    'sphinx.ext.viewcode',
    'numpydoc.numpydoc'
]

Putting numpydoc first leads to the error:
Extension error:
Unknown event name: autodoc-process-docstring

1 comment:

  1. Perhaps coming late here, but the solution (with Sphinx 1.3.5) is :

    extensions = [..., 'sphinx.ext.napoleon']

    Now you have additional options :

    http://www.sphinx-doc.org/en/stable/ext/napoleon.html

    ReplyDelete