php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #6173 urlencode() doesn't respect locale
Submitted: 2000-08-15 12:21 UTC Modified: 2001-01-04 08:55 UTC
From: mbravo at acm dot org Assigned:
Status: Closed Package: Misbehaving function
PHP Version: 3.0.16 OS: FreeBSD 4.1
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mbravo at acm dot org
New email:
PHP Version: OS:

 

 [2000-08-15 12:21 UTC] mbravo at acm dot org
By definition, urlencode() should leave alphanumeric characters unencoded. However, (uin my case) it doesn't respect locale settings, and encodes non-ASCII (but alphanumeric as per locale definition) characters when it shouldn't. Locale is correctly set up, and Apache does have correct LANG and LC_ALL environment variables in its runtime environment (checked by phpinfo()). I even tried executing explicit setlocale() call within a script but this doesn't change anything (which is probably correct as locale is already set systemwide)

I don't know if this problem is peculiar to FreeBSD installations, perhaps someone should check this out - might be possible, since judging by source code, system isalphanum() is used to determine whether a character should be encoded. However, FreeBSD in general handles locale very responsibly and this wouldn't be possible if fundamental checks like isalpha() were broken.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2001-01-04 08:55 UTC] hholzgra@php.net
this is intended behavior

see RFC 1738: Uniform Resource Locators (URL), Sectin 2.1:

   [...]
   No corresponding graphic US-ASCII:

   URLs are written only with the graphic printable characters of the
   US-ASCII coded character set. The octets 80-FF hexadecimal are not
   used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
   control characters; these must be encoded.
   [...]

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 08:01:29 2024 UTC