php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #54688 case insensitive search of stripos does not work when searching äöü in utf-8
Submitted: 2011-05-08 17:00 UTC Modified: 2011-05-09 09:20 UTC
From: g dot huebgen at arcor dot de Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.3.6 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: g dot huebgen at arcor dot de
New email:
PHP Version: OS:

 

 [2011-05-08 17:00 UTC] g dot huebgen at arcor dot de
Description:
------------
---
From manual page: http://www.php.net/function.stripos#Description
---
If some text is encoded in UTF-8 and I search this text with stripos for a string with (e.g.) lower case Umlaut (e.g. ü), this function does not find the upper-case Umlaut (Ü). That means case-insensitive does not work for Umlauts if a text file is encoded UTF-8.

Test script:
---------------
File test.txt contains "Übermut" and is encoded UTF-8 without BOM
<?php
$text = file_get_contents("test.txt");
echo $text."<br>";
$str = "über";
if (($pos=stripos($text,$str)) !== false)
	echo $str." gefunden";
else echo $str." nicht gefunden";
?>

Expected result:
----------------
Übermut
über gefunden 

Actual result:
--------------
Übermut
über nicht gefunden 

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-05-08 17:43 UTC] rasmus@php.net
-Status: Open +Status: Bogus
 [2011-05-08 17:43 UTC] rasmus@php.net
This is not a bug. The base string handling functions in PHP do not support 
multibyte character sets. Since UTF-8 is compatible with single-byte charsets at 
the low end, it may appear to work for UTF-8, but it will break as soon as you hit 
an actual mb character. You can use mb_stripos() in this case, or you can use the 
function overloading support in mbstring to make your stripos mb aware. 

See http://de.php.net/manual/en/mbstring.overload.php
 [2011-05-08 20:17 UTC] g dot huebgen at arcor dot de
Hi rasmus.
Now I tried mb_stripos but the result is not different to stripos.
The same program but using mb_stripos:
$text = file_get_contents("test-utf8.txt");
$str = "über";
if (($pos=mb_stripos($text,$str)) !== false)
	echo $str." found";
else echo $str." not found";

output is: not found!

If I use utf8_decode for both $text and $str then stripos will work properly.
 [2011-05-08 20:25 UTC] rasmus@php.net
That means your string is not actually in UTF-8. utf8_decode() converts text in 
ISO-8859-1 to UTF-8. You stated initially that you had text encoded in UTF-8.
 [2011-05-09 06:33 UTC] g dot huebgen at arcor dot de
The description of utf8_decode states clearly that this function decodes UTF8 text. The manual says:
"utf8_decode — Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1"

So my text is indeed in UTF-8 and my remark on utf8_decode only confirms what rasmus (comment #1) said.
 [2011-05-09 06:44 UTC] rasmus@php.net
Well, somewhere along the way you have messed up your encoding since it works 
fine when both strings are UTF-8:

var_dump(mb_stripos("Übermut","über",0,"UTF-8"));

Are you saying that this doesn't give you int(0) on your platform?
 [2011-05-09 09:20 UTC] g dot huebgen at arcor dot de
You are right. Your mb_stripos works fine. 
My mistake in this was that I forgot the parameter "UTF-8"!
Now everything is clear.
Thank you
Gerhard
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 11:01:30 2024 UTC