blog.farhan.codes

Farhan's Personal and Professional Blog


Convert Docx to Markdown

I needed to convert a Docx file to Markdown, but Pandoc kept giving me this obnoxious error:

$ pandoc test.docx -o test.md
pandoc: Cannot decode byte '\xae': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

However, you can use the tool unoconv to make an intermediary step to convert first to HTML and then to Markdown.

$ unoconv --stdout -f html test.docx | pandoc -f html -t markdown -o test.md

On Ubuntu (And other Debian-based systems I would imagine) you can get unoconv with a simple apt-get install unoconv.