php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #28494 Using utf8_decode with ereg breaks ereg
Submitted: 2004-05-23 01:59 UTC Modified: 2004-06-02 19:16 UTC
From: david at collectfair dot co dot uk Assigned:
Status: Not a bug Package: *Languages/Translation
PHP Version: 4.3.6 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: david at collectfair dot co dot uk
New email:
PHP Version: OS:

 

 [2004-05-23 01:59 UTC] david at collectfair dot co dot uk
Description:
------------
ereg wrongly indicates that a GET string contains only integers if the string is first processed using utf8_decode. 

Example: The regular expression should return true only if the variable 'tt' in the GET string contains between 1 and 8 number characters. 

This behaves correctly:
ereg('^[0-9]{1,8}$',$_GET['tt']))

This behaves wrongly:
ereg('^[0-9]{1,8}$',utf8_decode($_GET['tt'])))
---------------------------------------------------
PHP Configure line - './configure' '--with-apxs2=/usr/local/apache/bin/apxs' '--with-mysql=/usr/local/mysql/' '--with-mysql-sock=/tmp/mysql.sock' '--enable-exif' '--with-gd' '--with-jpeg-dir=/usr/lib' '--with-png-dir=/usr/lib' '--with-zlib-dir=/usr/lib' '--with-xpm-dir=/usr/lib' '--with-freetype-dir=/usr/lib' '--with-t1lib=/usr/lib' '--with-freetype-dir=/usr/lib' '--disable-debug' '--with-config-file-path=/etc/httpd' '--with-openssl=/usr/local/ssl' '--enable-memory-limit'
----------------------------------------------------
Apache 2.0.49
core mod_access mod_auth mod_include mod_log_config mod_env mod_headers mod_setenvif mod_ssl prefork http_core mod_mime mod_status mod_autoindex mod_asis mod_cgi mod_negotiation mod_dir mod_imap mod_actions mod_alias mod_so mod_expires mod_rewrite mod_deflate mod_logio sapi_apache2 mod_security

Reproduce code:
---------------
Call this script with the following line in the browser:

http://localhost/test.php?tt=119%f0

<?php
if(isset($_GET['tt']) && eregi('^[0-9]{1,8}$',utf8_decode($_GET['tt']))){
  //Should only get here if 'tt' is an integer 
  echo 'Integer';
}else{
  echo 'Not an integer';
}
?>

Expected result:
----------------
The script should return "Not an integer" in the browser.

Actual result:
--------------
The script returns "Integer" in the browser, even though the GET string contains other characters.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-05-23 14:40 UTC] david at collectfair dot co dot uk
Further testing shows that failure only occurs when the %f0 is at the GET string

Changing to this Perl based regular expression removes the problem completely.

preg_match('/^\d{1,8}$/', utf8_decode($_GET['tt']));
 [2004-05-23 15:37 UTC] moriyoshi@php.net
1. You cannot use utf8_decode() for generic UTF-8 
decoding. It was designed specifically to use with ISO
-8859-1 characters. Consider using mb_convert_encoding() 
or iconv() instead.

2. ereg() neither supports UTF-8. Try using 
preg_match().

 [2004-06-02 19:16 UTC] moriyoshi@php.net
sidenote: apparently you are trying to use utf8_decode() 
on a string that may contain a character that is not 
part of the ISO-8859-1 character set. So there'd be two 
options, one is to just use preg_match() without 
demangling it, and the other is to use some "real" 
encoding conversion function in place of utf8_decode() 
and feed the result to ereg().


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 14:01:28 2024 UTC