For me, this issue was not solved until I noticed this little tidbit here:
http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output
Short version:
Read in the file with this command:
>>> parser = etree.XMLParser(remove_blank_text=True)
>>> tree = etree.parse(filename, parser)
That will “reset” the already existing indentation, allowing the output to generate it’s own indentation correctly. Then pretty_print as normal:
>>> tree.write(<output_file_name>, pretty_print=True)