php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #6173 urlencode() doesn't respect locale
Submitted: 2000-08-15 12:21 UTC Modified: 2001-01-04 08:55 UTC
From: mbravo at acm dot org Assigned:
Status: Closed Package: Misbehaving function
PHP Version: 3.0.16 OS: FreeBSD 4.1
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
48 - 4 = ?
Subscribe to this entry?

 
 [2000-08-15 12:21 UTC] mbravo at acm dot org
By definition, urlencode() should leave alphanumeric characters unencoded. However, (uin my case) it doesn't respect locale settings, and encodes non-ASCII (but alphanumeric as per locale definition) characters when it shouldn't. Locale is correctly set up, and Apache does have correct LANG and LC_ALL environment variables in its runtime environment (checked by phpinfo()). I even tried executing explicit setlocale() call within a script but this doesn't change anything (which is probably correct as locale is already set systemwide)

I don't know if this problem is peculiar to FreeBSD installations, perhaps someone should check this out - might be possible, since judging by source code, system isalphanum() is used to determine whether a character should be encoded. However, FreeBSD in general handles locale very responsibly and this wouldn't be possible if fundamental checks like isalpha() were broken.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2001-01-04 08:55 UTC] hholzgra@php.net
this is intended behavior

see RFC 1738: Uniform Resource Locators (URL), Sectin 2.1:

   [...]
   No corresponding graphic US-ASCII:

   URLs are written only with the graphic printable characters of the
   US-ASCII coded character set. The octets 80-FF hexadecimal are not
   used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
   control characters; these must be encoded.
   [...]

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 13:01:30 2024 UTC