php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43863 str_word_count and russian chars in locale cp1251
Submitted: 2008-01-16 07:59 UTC Modified: 2008-01-16 08:36 UTC
From: phprus at gmail dot com Assigned: tony2001 (profile)
Status: Closed Package: Strings related
PHP Version: 5.2.5 OS: OpenSuSE 10.2
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: phprus at gmail dot com
New email:
PHP Version: OS:

 

 [2008-01-16 07:59 UTC] phprus at gmail dot com
Description:
------------
str_word_count return wrong number, if char "я" is
contained in the word.

Problem code (in file ext/standard/string.c):
while (p < e && (isalpha(*p) || (char_list && ch[(unsigned char)*p]) || *p == '\'' || *p == '-')) {

Corrected code:
while (p < e && (isalpha((unsigned char)*p) || (char_list && ch[(unsigned char)*p]) || *p == '\'' || *p == '-')) {

Description of bug fixes in Russian language:
http://phpclub.ru/talk/showthread.php?postid=746475#post746475

Reproduce code:
---------------
<?php
setlocale(LC_ALL, 'ru_RU.cp-1251', 'ru_RU.CP1251');
var_dump(str_word_count('&#1088;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081; &#1090;&#1077;&#1082;&#1089;&#1090;. &#1103; &#1090;&#1077;&#1089;&#1090;&#1077;&#1088;. &#1072;&#1103;&#1073;&#1072;&#1075;. &#1103;&#1072;&#1087;. &#1072;&#1074;&#1103;', 2));
?>

Expected result:
----------------
array(7) {
  [0]=>
  string(7) "&#1088;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081;"
  [8]=>
  string(5) "&#1090;&#1077;&#1082;&#1089;&#1090;"
  [15]=>
  string(1) "&#1103;"
  [17]=>
  string(6) "&#1090;&#1077;&#1089;&#1090;&#1077;&#1088;"
  [25]=>
  string(5) "&#1072;&#1103;&#1073;&#1072;&#1075;"
  [32]=>
  string(3) "&#1103;&#1072;&#1087;"
  [37]=>
  string(3) "&#1072;&#1074;&#1103;"
}

Actual result:
--------------
array(7) {
  [0]=>
  string(7) "&#1088;&#1091;&#1089;&#1089;&#1082;&#1080;&#1081;"
  [8]=>
  string(5) "&#1090;&#1077;&#1082;&#1089;&#1090;"
  [17]=>
  string(6) "&#1090;&#1077;&#1089;&#1090;&#1077;&#1088;"
  [25]=>
  string(1) "&#1072;"
  [27]=>
  string(3) "&#1073;&#1072;&#1075;"
  [33]=>
  string(2) "&#1072;&#1087;"
  [37]=>
  string(2) "&#1072;&#1074;"
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-16 08:36 UTC] tony2001@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

Patch committed, thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC