php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #36130 UTF-8 DeAccentizer
Submitted: 2006-01-23 08:20 UTC Modified: 2006-01-23 08:26 UTC
From: mbjr at mbjr dot hu Assigned:
Status: Closed Package: Feature/Change Request
PHP Version: 5.1.2 OS: Linux
Private report: No CVE-ID: None
 [2006-01-23 08:20 UTC] mbjr at mbjr dot hu
Description:
------------
Although UTF-8 is becoming widely supported, many people in relevant countries are placing search string w/o any accents and special characters, as they got used to the old system.

The only way atm to produce accent-free string is manual strtr for in every case when such character is found.

Reproduce code:
---------------
n/a

Expected result:
----------------
?rv?ztűrő t?k?rf?r?g?p -> Arvizturo tukorfurogep

These all below should be converted to "o":

? = capital letter o with grave
? = capital letter o with acute
? = capital letter o with circumflex
? = capital letter o with tilde
? = capital letter o with diaeresis
Ō = capital letter o with macron
Ŏ = capital letter o with breve
Ő = capital letter o with double acute
Ơ = capital letter o with horn
Ǒ = capital letter o with caron
Ǫ = capital letter o with ogonek
Ǭ = capital letter o with ogonek and macron
Ȍ = capital letter o with double grave
Ȏ = capital letter o with inverted breve
Ȫ = capital letter o with diaeresis and macron
Ȭ = capital letter o with tilde and macron
Ȯ = capital letter o with dot above
Ȱ = capital letter o with dot above and macron
Ṍ = capital letter o with tilde and acute
Ṏ = capital letter o with tilde and diaeresis
Ṑ = capital letter o with macron and grave
Ṓ = capital letter o with macron and acute
Ọ = capital letter o with dot below
Ỏ = capital letter o with hook above
Ố = capital letter o with circumflex and acute
Ồ = capital letter o with circumflex and grave
Ổ = capital letter o with circumflex and hook above
Ỗ = capital letter o with circumflex and tilde
Ộ = capital letter o with circumflex and dot below
Ớ = capital letter o with horn and acute
Ờ = capital letter o with horn and grave
Ở = capital letter o with horn and hook above
Ỡ = capital letter o with horn and tilde
Ợ = capital letter o with horn and dot below

Those 34 pieces above are latin capital letters but there're another 34 pieces for their small case, which means in the extended latin script set we have 68 matches for an "o".

Same applies to e,u,i,a

Actual result:
--------------
n/a

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-01-23 08:26 UTC] derick@php.net
There is PECL/transliterate for this already. See http://pecl.php.net/translit and http://derickrethans.nl/translit.php (You need the 'diacritical_remove' filter).
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jul 18 16:00:02 2025 UTC