13 lines
478 B
Plaintext
13 lines
478 B
Plaintext
html5lib (HTML parser based on the HTML5 specification)
|
|
|
|
HTML parser designed to follow the HTML5 specification. The parser is
|
|
designed to handle all flavours of HTML and parses invalid documents
|
|
using well-defined error handling rules compatible with the behaviour of
|
|
major desktop web browsers.
|
|
|
|
Output is to a tree structure; the current release supports output
|
|
to DOM, ElementTree and lxml tree formats as well as a simple
|
|
custom format.
|
|
|
|
Optional: datrie, lxml, and genshi
|