php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #25669 eregi() vs. 8-bit chars in regex
Submitted: 2003-09-26 08:20 UTC Modified: 2003-10-01 05:21 UTC
From: svs at ropnet dot ru Assigned:
Status: Closed Package: Regexps related
PHP Version: 4.3.3 OS: FreeBSD 4.8
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: svs at ropnet dot ru
New email:
PHP Version: OS:

 

 [2003-09-26 08:20 UTC] svs at ropnet dot ru
Description:
------------
Even though locale is set up correctly, eregi() fails to match international characters case-insensitively.  The reason, as far
as I understand, is that code in regex/ passes a negative value to isalpha(). This can be worked around by recompiling regex/regcomp.c manually with -funsigned-char (assuming GCC is the compiler).


Reproduce code:
---------------
<?php
setlocale(LC_ALL, "ru_RU.KOI8-R"); 
echo setlocale(LC_ALL, ""), "\n";
if (eregi("&#1103;", "&#1071;&#1071;")) { echo "ok\n"; } else { echo "bad\n";}
if (preg_match("/&#1103;/i", "&#1071;&#1071;")) { echo "ok\n"; } else { echo "bad\n";}
?>


Expected result:
----------------
ru_RU.KOI8-R
ok
ok


Actual result:
--------------
ru_RU.KOI8-R
bad
ok


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-09-26 09:13 UTC] sniper@php.net
I don't think you meant to use those chars in your example
script..? Can you please add the actual ones here?

 [2003-09-26 09:16 UTC] sniper@php.net
And what was the configure line used to configure PHP?

 [2003-09-26 09:34 UTC] svs at ropnet dot ru
oops, mozilla mangled those characters.

begin 644 l.php
M/#]P:'`*<V5T;&]C86QE*$Q#7T%,3"P@(G)U7U)5+DM/23@M4B(I.R`*96-H
M;R!S971L;V-A;&4H3$-?04Q,+"`B(BDL(")<;B(["FEF("AE<F5G:2@BT2(L
M("+Q\2(I*2![(&5C:&\@(F]K7&XB.R!](&5L<V4@>R!E8VAO(")B861<;B([
M?0II9B`H<')E9U]M871C:"@B+]$O:2(L("+Q\2(I*2![(&5C:&\@(F]K7&XB
=.R!](&5L<V4@>R!E8VAO(")B861<;B([?0H_/@H`
`
end

'./configure' '--without-x' '--disable-debug' '--with-apxs=/usr/local/apache/bin/apxs' '--with-mod_charset' '--enable-dba' '--with-gdbm=/usr/local' '--with-db4=/usr/local' '--enable-dbase' '--enable-ftp' '--enable-sockets' '--enable-inline-optimization' '--enable-memory-limit' '--with-mysql' '--with-gd' '--enable-gd-native-ttf' '--with-zlib=/usr' '--with-jpeg-dir=/usr/local' '--with-png-dir=/usr/local' '--with-freetype-dir=/usr/local' '--enable-exif' '--enable-calendar' '--enable-wddx' '--with-gmp' '--with-openssl=/usr' '--with-iconv=/usr/local' '--with-imap=shared,/usr/local' '--with-curl=/usr/local' '--with-dom=shared,/usr/local' '--with-dom-xslt=shared,/usr/local' '--with-dom-exslt=shared,/usr/local' '--enable-xslt=shared' '--with-xslt-sablot=shared,/usr/local' '--with-iconv-dir=/usr/local' '--with-expat-dir=/usr/local' '--with-zip=/usr/local' '--with-pdflib' '--with-tiff-dir=/usr/local'
 [2003-09-28 20:14 UTC] iliaa@php.net
Try this patch and see if it fixes the problem.
http://bb.prohost.org/reg.txt
 [2003-09-29 06:23 UTC] svs at ropnet dot ru
No, it does not.
 [2003-09-29 08:08 UTC] moriyoshi@php.net
Ilia: your patch doesn't seem to deal with it correctly, as isalpha() expects signed integer indeed. A char value can be any of the numbers, -128 to 127, so if you cast it to unsigned integer, you never got a value in range of 0 to 255. So you should first cast it to unsigned char, and then make it signed integer.
 [2003-09-29 11:17 UTC] moriyoshi@php.net
Can you try this one again:
http://www.voltex.jp/patches/regpatch.diff

Note: This problem is known to not be reproduced with glibc and unfortunately I don't have a freebsd box atm.

 [2003-09-29 12:03 UTC] svs at ropnet dot ru
This one is OK.  Should I write a test case?
 [2003-09-29 12:11 UTC] moriyoshi@php.net
Yup, if possible. The following is just a template (supposed to be put in ext/standard/tests/reg). Please try to avoid using non-ascii characters in the test case. Thanks in advance.

--TEST--
Bug #25669 (eregi() with non-ascii characters)
--SKIP--
<?php
setlocale(LC_ALL, "de_DE.ISO8859-1") || die('SKIP de_DE.ISO8859-1 locale not sup
ported by this system');
?>
--FILE--
<?php
setlocale(LC_ALL, "de_DE.ISO8859-1");
var_dump((bool)eregi("\xc4\xcb\xf6", "\xe4\xeb\xd6"));
var_dump((bool)eregi("\xc4", "\xe1"));
var_dump((bool)preg_match("/\xc4\xcb\xf6/i", "\xe4\xeb\xd6"));
var_dump((bool)preg_match("/\xc4/i", "\xe1"));
?>
--EXPECT--
bool(true)
bool(false)
bool(true)
bool(false)

 [2003-10-01 04:41 UTC] svs at ropnet dot ru
This template is a complete test case actually. I did not have to modify it in any way.
 [2003-10-01 05:21 UTC] moriyoshi@php.net
Ok, I'm closing the bug.
Thanks for helping make php better.

(The fix will go to 4.3.4-rc2 or later versions)

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 16:01:29 2024 UTC