|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2021-04-06 10:59 UTC] cmb@php.net
-Status: Open
+Status: Duplicate
-Assigned To:
+Assigned To: cmb
[2021-04-06 10:59 UTC] cmb@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Mon Oct 27 11:00:01 2025 UTC |
Description: ------------ Dear maintainers, It is quite possible and feasible to have a more flexible 'html_entity_decode' that handles properly the named char references that, for historical reasons, are allowed to not be terminated with semicolon [1]. To sustain my claim, I invite you to examine Html-Cref [2] -- a project I developed recently that implements several named character reference *parsers* based on tries instead of hash tables. Upon bringing into Html-Cref's framework PHP's function 'resolve_named_entity_html' and an adapted hash table 'ent_ht_html5' (all these according to the patch file [3]; the size of 'ent_ht_html5' was preserved), the standalone binary obtained 'html-cref' is about 25% bigger then the one built with e.g. the 'etrie' parser: 203K vs. 163K. The measurements done (`html-cref-test --cycles') show that the newly added function 'html_cref_php_parse' in 'src/html_cref_php.c' runs about 4% slower than either of the trie-based parsers 'iwtrie', 'ietrie', 'etrie' and 'wtrie' on a 64-bit Intel Core I5-3210M machine. Sincerely, Stefan Vargyas. PS: this post is a slightly changed version of [4]. Hereby I hope to catch your attention and open a discussion about an improved 'html_entity_decode'. [1] 12.2 Parsing HTML documents: 12.2.5.73 Named character reference state https://html.spec.whatwg.org/#named-character-reference-state [2] Html-Cref: Fast HTML Character References Decoder https://github.com/stvar/html-cref [3] html-cref-php.patch https://gist.github.com/stvar/df320f55d83cedac9fd7261256d20906 [4] https://bugs.php.net/bug.php?id=77769#1557609156