php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #28494 Using utf8_decode with ereg breaks ereg
Submitted: 2004-05-23 01:59 UTC Modified: 2004-06-02 19:16 UTC
From: david at collectfair dot co dot uk Assigned:
Status: Not a bug Package: *Languages/Translation
PHP Version: 4.3.6 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: david at collectfair dot co dot uk
New email:
PHP Version: OS:

 

 [2004-05-23 01:59 UTC] david at collectfair dot co dot uk
Description:
------------
ereg wrongly indicates that a GET string contains only integers if the string is first processed using utf8_decode. 

Example: The regular expression should return true only if the variable 'tt' in the GET string contains between 1 and 8 number characters. 

This behaves correctly:
ereg('^[0-9]{1,8}$',$_GET['tt']))

This behaves wrongly:
ereg('^[0-9]{1,8}$',utf8_decode($_GET['tt'])))
---------------------------------------------------
PHP Configure line - './configure' '--with-apxs2=/usr/local/apache/bin/apxs' '--with-mysql=/usr/local/mysql/' '--with-mysql-sock=/tmp/mysql.sock' '--enable-exif' '--with-gd' '--with-jpeg-dir=/usr/lib' '--with-png-dir=/usr/lib' '--with-zlib-dir=/usr/lib' '--with-xpm-dir=/usr/lib' '--with-freetype-dir=/usr/lib' '--with-t1lib=/usr/lib' '--with-freetype-dir=/usr/lib' '--disable-debug' '--with-config-file-path=/etc/httpd' '--with-openssl=/usr/local/ssl' '--enable-memory-limit'
----------------------------------------------------
Apache 2.0.49
core mod_access mod_auth mod_include mod_log_config mod_env mod_headers mod_setenvif mod_ssl prefork http_core mod_mime mod_status mod_autoindex mod_asis mod_cgi mod_negotiation mod_dir mod_imap mod_actions mod_alias mod_so mod_expires mod_rewrite mod_deflate mod_logio sapi_apache2 mod_security

Reproduce code:
---------------
Call this script with the following line in the browser:

http://localhost/test.php?tt=119%f0

<?php
if(isset($_GET['tt']) && eregi('^[0-9]{1,8}$',utf8_decode($_GET['tt']))){
  //Should only get here if 'tt' is an integer 
  echo 'Integer';
}else{
  echo 'Not an integer';
}
?>

Expected result:
----------------
The script should return "Not an integer" in the browser.

Actual result:
--------------
The script returns "Integer" in the browser, even though the GET string contains other characters.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-05-23 14:40 UTC] david at collectfair dot co dot uk
Further testing shows that failure only occurs when the %f0 is at the GET string

Changing to this Perl based regular expression removes the problem completely.

preg_match('/^\d{1,8}$/', utf8_decode($_GET['tt']));
 [2004-05-23 15:37 UTC] moriyoshi@php.net
1. You cannot use utf8_decode() for generic UTF-8 
decoding. It was designed specifically to use with ISO
-8859-1 characters. Consider using mb_convert_encoding() 
or iconv() instead.

2. ereg() neither supports UTF-8. Try using 
preg_match().

 [2004-06-02 19:16 UTC] moriyoshi@php.net
sidenote: apparently you are trying to use utf8_decode() 
on a string that may contain a character that is not 
part of the ISO-8859-1 character set. So there'd be two 
options, one is to just use preg_match() without 
demangling it, and the other is to use some "real" 
encoding conversion function in place of utf8_decode() 
and feed the result to ereg().


 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jan 02 11:01:29 2025 UTC