From here, using only standard python modules:
But, if you want to mess around in more detail in the document, then we can use the python-docx module.
import zipfile, re
docx = zipfile.ZipFile('/path/to/file/mydocument.docx')
content = docx.read('word/document.xml')
cleaned = re.sub('<(.|\n)*?>','',content)
print cleaned
But, if you want to mess around in more detail in the document, then we can use the python-docx module.
Comments
Post a Comment