Bs4 Breaks Html Trying To Repair It
BS4 corrects faulty html. Usually this is not a problem. I tried parsing, altering and saving the html of this page: ulisses-regelwiki.de/index.php/sonderfertigkeiten.html In this
Solution 1:
Try this lib.
from simplified_scrapy import SimplifiedDoc
html = '''
<!DOCTYPE html><center>
Some Test content
<!-- A comment --><center>
'''
doc = SimplifiedDoc(html)
print (doc.html)
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
Post a Comment for "Bs4 Breaks Html Trying To Repair It"