php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #31064 str_word_count and german umlauts and ligatures
Submitted: 2004-12-12 01:57 UTC Modified: 2004-12-13 21:19 UTC
From: km at control-b dot de Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.0.2 OS: windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: km at control-b dot de
New email:
PHP Version: OS:

 

 [2004-12-12 01:57 UTC] km at control-b dot de
Description:
------------
str_word_count return wrong number, if german umlaut "ö" (ö) is contained in the word. 
it is okay, if the umlaut is the first or the last character.

the same goes für the ligature "ß" ß.



Reproduce code:
---------------
echo str_word_count('wäre');         # 1 - okay
echo str_word_count('würde');        # 1 - okay
echo str_word_count('wérk');         # 1 - okay
echo str_word_count('wörk');         # 2 - wrong!!!
echo str_word_count('örk');          # 1 - okay
echo str_word_count('werök');        # 2 - wrong!!!
echo str_word_count('weräk');        # 1 - okay


echo str_word_count('straßenbahnölbehälter'); # 3 words???

Expected result:
----------------
the above code should return always 1


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-12-12 14:54 UTC] derick@php.net
This is not a bug, PHP doesn't know anything about UTF8 encodings and will split up a word if it's not [A-z].
 [2004-12-13 21:19 UTC] km at control-b dot de
well, i think that has nothing to do with utf-8 encoding. (it is misdecoded in the email!)
i type the text, as it is, in german (no encodings or entities are used).

what bothers me, is the fact, that it works with all german umlauts expect "ö" and the "ß".
when it works for ä and ü, but not für ö, it must be a bug!
so the question is, if there are more person, like me, who discovered this!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri May 17 07:01:32 2024 UTC