|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #31649 urldecode should support %uHHHH Unicode codepoint notation, which is standard
Submitted: 2005-01-21 22:29 UTC Modified: 2022-04-08 08:19 UTC
Avg. Score:4.5 ± 0.5
Reproduced:1 of 2 (50.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: james at gogo dot co dot nz Assigned: ilutov (profile)
Status: Closed Package: URL related
PHP Version: * OS: All
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: james at gogo dot co dot nz
New email:
PHP Version: OS:


 [2005-01-21 22:29 UTC] james at gogo dot co dot nz
urldecode() does not understand the %uxxxx format for escaping unicode characters above 0xFF.

This is a very old bug, originally reported as bug #15027 and declared bogus, I believe erroneously, and here is the reasoning...

In all modern browsers (including Mozilla), JavaScript's escape() function uses %HH for Unicode codepoints below 0x0100, but %uHHHH for codepoints above there.

From ECMA-262:
For characters whose Unicode encoding is 0xFF or less, a two-digit escape sequence of the form %xx is used in accordance with RFC1738. For characters whose Unicode encoding is greater than 0xFF, a four-digit escape sequence of the form %uxxxx is used.

I believe this is a bug, PHP is unable to urldecode the valid escape()d values from modern browsers when those escape()d strings contain unicode characters greater than 0xFF.  

Declaring it not a bug because it is not in the RFCs, but rather defined by ECMA is a poor decision.

Reproduce code:
echo urldecode('%u2013');

Expected result:
A string containing the three characters comprising the unicode character 0x2013 (En Dash) in utf-8, namely 0xE2 0x80 and 0x93.

Actual result:
The literal string "%u2013".


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2005-01-21 22:46 UTC]
PHP doesnt support unicode in a whole lot of places. Marking this as a feature request instead.
 [2015-01-08 23:26 UTC]
-Package: Feature/Change Request +Package: URL related -PHP Version: 4.3.10 +PHP Version: *
 [2015-01-08 23:26 UTC]
This should probably decode to UTF-8, if it decodes to anything.
 [2015-01-08 23:26 UTC]
-Summary: urldecode does not follow ECMA standard or standard browser practice +Summary: urldecode should support %uHHHH Unicode codepoint notation, which is standard
 [2018-03-26 21:59 UTC]
The %uxxxx encoding is non-standard, and the escape() function
is contained in an annex of ECMA-262 (Edition 6.0) only, which states[1]:

| All of the language features and behaviours specified in this
| annex have one or more undesirable characteristics and in the
| absence of legacy usage would be removed from this specification.

In my opinion, it does not make sense to support %uxxxx encoding in

[1] <>
 [2022-04-08 08:19 UTC]
-Status: Open +Status: Closed -Assigned To: +Assigned To: ilutov
 [2022-04-08 08:19 UTC]
As per comment from @cmb I'm closing this issue
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu Dec 08 13:05:53 2022 UTC