|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #25669 eregi() vs. 8-bit chars in regex
Submitted: 2003-09-26 08:20 UTC Modified: 2003-10-01 05:21 UTC
From: svs at ropnet dot ru Assigned:
Status: Closed Package: Regexps related
PHP Version: 4.3.3 OS: FreeBSD 4.8
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: svs at ropnet dot ru
New email:
PHP Version: OS:


 [2003-09-26 08:20 UTC] svs at ropnet dot ru
Even though locale is set up correctly, eregi() fails to match international characters case-insensitively.  The reason, as far
as I understand, is that code in regex/ passes a negative value to isalpha(). This can be worked around by recompiling regex/regcomp.c manually with -funsigned-char (assuming GCC is the compiler).

Reproduce code:
setlocale(LC_ALL, "ru_RU.KOI8-R"); 
echo setlocale(LC_ALL, ""), "\n";
if (eregi("&#1103;", "&#1071;&#1071;")) { echo "ok\n"; } else { echo "bad\n";}
if (preg_match("/&#1103;/i", "&#1071;&#1071;")) { echo "ok\n"; } else { echo "bad\n";}

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2003-09-26 09:13 UTC]
I don't think you meant to use those chars in your example
script..? Can you please add the actual ones here?

 [2003-09-26 09:16 UTC]
And what was the configure line used to configure PHP?

 [2003-09-26 09:34 UTC] svs at ropnet dot ru
oops, mozilla mangled those characters.

begin 644 l.php

'./configure' '--without-x' '--disable-debug' '--with-apxs=/usr/local/apache/bin/apxs' '--with-mod_charset' '--enable-dba' '--with-gdbm=/usr/local' '--with-db4=/usr/local' '--enable-dbase' '--enable-ftp' '--enable-sockets' '--enable-inline-optimization' '--enable-memory-limit' '--with-mysql' '--with-gd' '--enable-gd-native-ttf' '--with-zlib=/usr' '--with-jpeg-dir=/usr/local' '--with-png-dir=/usr/local' '--with-freetype-dir=/usr/local' '--enable-exif' '--enable-calendar' '--enable-wddx' '--with-gmp' '--with-openssl=/usr' '--with-iconv=/usr/local' '--with-imap=shared,/usr/local' '--with-curl=/usr/local' '--with-dom=shared,/usr/local' '--with-dom-xslt=shared,/usr/local' '--with-dom-exslt=shared,/usr/local' '--enable-xslt=shared' '--with-xslt-sablot=shared,/usr/local' '--with-iconv-dir=/usr/local' '--with-expat-dir=/usr/local' '--with-zip=/usr/local' '--with-pdflib' '--with-tiff-dir=/usr/local'
 [2003-09-28 20:14 UTC]
Try this patch and see if it fixes the problem.
 [2003-09-29 06:23 UTC] svs at ropnet dot ru
No, it does not.
 [2003-09-29 08:08 UTC]
Ilia: your patch doesn't seem to deal with it correctly, as isalpha() expects signed integer indeed. A char value can be any of the numbers, -128 to 127, so if you cast it to unsigned integer, you never got a value in range of 0 to 255. So you should first cast it to unsigned char, and then make it signed integer.
 [2003-09-29 11:17 UTC]
Can you try this one again:

Note: This problem is known to not be reproduced with glibc and unfortunately I don't have a freebsd box atm.

 [2003-09-29 12:03 UTC] svs at ropnet dot ru
This one is OK.  Should I write a test case?
 [2003-09-29 12:11 UTC]
Yup, if possible. The following is just a template (supposed to be put in ext/standard/tests/reg). Please try to avoid using non-ascii characters in the test case. Thanks in advance.

Bug #25669 (eregi() with non-ascii characters)
setlocale(LC_ALL, "de_DE.ISO8859-1") || die('SKIP de_DE.ISO8859-1 locale not sup
ported by this system');
setlocale(LC_ALL, "de_DE.ISO8859-1");
var_dump((bool)eregi("\xc4\xcb\xf6", "\xe4\xeb\xd6"));
var_dump((bool)eregi("\xc4", "\xe1"));
var_dump((bool)preg_match("/\xc4\xcb\xf6/i", "\xe4\xeb\xd6"));
var_dump((bool)preg_match("/\xc4/i", "\xe1"));

 [2003-10-01 04:41 UTC] svs at ropnet dot ru
This template is a complete test case actually. I did not have to modify it in any way.
 [2003-10-01 05:21 UTC]
Ok, I'm closing the bug.
Thanks for helping make php better.

(The fix will go to 4.3.4-rc2 or later versions)

PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Jul 14 08:01:28 2024 UTC