Tuesday, June 7, 2011

Python and word documents

From here, using only standard python modules:

import zipfile, re

docx = zipfile.ZipFile('/path/to/file/mydocument.docx')
content = docx.read('word/document.xml')
cleaned = re.sub('<(.|\n)*?>','',content)
print cleaned


But, if you want to mess around in more detail in the document, then we can use the python-docx module.

No comments:

Post a Comment