|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2010-09-25 11:09 UTC] masteram at gmail dot com
Description: ------------ I have tested this with PHP 5.2.9 and 5.3.3. Some UTF-8 strings are not being processed correctly by parse_url. In the given example, the result of the evaluation of strings which contains the chars 'ם' or 'א' is corrupt, whereas the string 'מישהו'(which does not contain the above chars) is being processed correctly. The affected characters (in UTF-8) are comprised of the following bytes: ם - d7|9d א - d7|90 Those are converted to a char which contains the following bytes: d7|5f. In addition to ruining the url, this char is not safe with preg_replace. Therefore, if we merge the result of parse_url back into a string, and then attempting to replace, say, spaces with underscores using preg_replace, we will get an empty string. I believe that this is similar to bug #26391. Test script: --------------- $url = 'http://www.mysite.org/he/פרויקטים/ByYear.html'; $url = parse_url($url); //$url['path'] is now corrupt $url = preg_replace('/\s+/u','_',$url['path']); //$url is now undefined Expected result: ---------------- The correct portion of the url. Actual result: -------------- Corrupt string (or blank after using preg_replace). PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Nov 01 03:00:01 2025 UTC |
This issue seems to be platform dependent. For example, on Windows Vista with PHP 5.3.1, parse_url('http://mydomain.com/path/道') returns $array['path'] = "/path/". However, on a MAC, it works correctly and returns "/path/道". We can work around it by uuencoding each part of the array and then decoding the various legal URL characters ("/", ":", "&", and so on) before running parse_url, then decoding the path. However, a parse_url_utf8 function would be very convenient and probably faster. Thanks.