|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2012-01-25 15:29 UTC] t dot nickl at exse dot de
Description: ------------ //This code must be run via web: //This is a string from e.g. some database containing a german umlaut 'ä'. Note the encoding really is iso8859-1 . It's just assigned here literally to be concise. $a = "Rechnungsadresse ändern"; //this output works: (An empty string activates some autodetection) var_dump(htmlentities($a, ENT_COMPAT | ENT_HTML401, '')); //this works too (the same output is generated): var_dump(htmlentities($a, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1')); //this does NOT work (outputs empty string) var_dump(htmlentities($a)); // Reason: php changed the charset htmlentities uses when you NOT give anything (90% of the code out there): //determine_charset() : /////////////////////////////////////////////////////// // php-5.2.1/ext/standard/html.c : // /* Guarantee default behaviour for backwards compatibility */ // if (charset_hint == NULL) // return cs_8859_1; ///////////////////////////////////////////////////// // php-5.4.0RC4/ext/standard/html.c : // /* Default is now UTF-8 */ // if (charset_hint == NULL) // return cs_utf_8; // This breaks the meaning of existing german code. For example, typo3 outputs empty string if end users used german umlauts in rich text editor in backend. // Please change determine_charset() back to using cs_8859_1 if the third parameter of htmlentities() is omitted. Test script: --------------- See description. Expected result: ---------------- See description. Actual result: -------------- See description. PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Nov 23 06:00:02 2025 UTC |
foreach(array(chr(195).chr(132) /*utf8*/, chr(196) /*latin1*/) as $germanUmlaut) { $str= 'a'.$germanUmlaut.'de'; $pos= strpos($str, $germanUmlaut); var_dump(htmlentities($str)); var_dump($pos); var_dump(substr($str, $pos, 1)); } /* on php5.3.10: string(12) "aÃ<secondbyteofutf8umlaut>de" int(1) string(1) "<firstbyteofutf8umlaut>" string(9) "aÄde" int(1) string(1) "<latin1umlaut>" on php5.4: string(9) "aÄde" int(1) string(1) "<firstbyteofutf8umlaut>" string(0) "" int(1) string(1) "<latin1umlaut>" I find it very funny that htmlentities assumes utf8 now, but substr is still thinking in latin1, as it gives only back the first byte of a multibyte character (seen above).