î ðE›U³ã@s‘ddlmZmZddlmZddlZddlZddlZddddgZej dƒZ Gd d „d eƒZ d d „Z dS) é)Ú HTMLParserÚHTMLParseError)Úname2codepointNZaltZhrefÚsrcÚtitlez\s+c@s¬eZdZdd„Zdd„Zdd„Zdd„Zd d „Zd d „Zd d„Z dd„Z dd„Z dd„Z dd„Z dd„Zdd„ZdS)Ú MyHTMLParsercCs5tj|ƒd|_d|_d|_d|_dS)NÚstarttagFÚ)rÚ__init__ÚlastÚin_preÚoutputÚlast_tag)Úself©rúS/Users/gjtorikian/Development/commonmarker/ext/commonmarker/cmark/test/normalize.pyr s     zMyHTMLParser.__init__cCsæ|jdkp|jdk}|o3|j|jƒ}|r]|jdkr]|jdƒ}n|js{tjd|ƒ}n|rÊ|j rÊ|jdkr©|jƒ}qÊ|jdkrÊ|jƒ}qÊn|j|7_d|_dS)NÚendtagrÚbrÚ ú Údata) r Ú is_block_tagrÚlstripr Ú whitespace_reÚsubÚstripr )rrZ after_tagZafter_block_tagrrrÚ handle_datas zMyHTMLParser.handle_datacCsi|dkrd|_n$|j|ƒr<|jjƒ|_n|jd|d7_||_d|_dS)NÚpreFzr)r rr Úrstriprr )rÚtagrrrÚ handle_endtag!s   zMyHTMLParser.handle_endtagcCsæ|dkrd|_n|j|ƒr<|jjƒ|_n|jd|7_|rÁ|jƒx_|D]T\}}|jd|7_|dkrf|jd tj|ddƒd7_qfqfWn|jd7_||_d |_dS) NrTúZcomment)r r )rrrrrÚhandle_comment@szMyHTMLParser.handle_commentcCs$|jd|d7_d|_dS)Nz>> normalize_html("

a \t b

") '

a b

' >>> normalize_html("

a \t\nb

") '

a b

' * Whitespace surrounding block-level tags is removed. >>> normalize_html("

a b

") '

a b

' >>> normalize_html("

a b

") '

a b

' >>> normalize_html("

a b

") '

a b

' >>> normalize_html("\n\t

\n\t\ta b\t\t

\n\t") '

a b

' >>> normalize_html("a b ") 'a b ' * Self-closing tags are converted to open tags. >>> normalize_html("
") '
' * Attributes are sorted and lowercased. >>> normalize_html('x') 'x' * References are converted to unicode, except that '<', '>', '&', and '"' are rendered using entities. >>> normalize_html("∀&><"") '\u2200&><"' z'(\|\<[^>]*\>|[^<]+)rNézs   i