php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #57838 Doesn't works russian stemmer if LANG....
Submitted: 2007-09-13 10:20 UTC Modified: 2008-04-03 07:52 UTC
From: bolk at lixil dot ru Assigned:
Status: No Feedback Package: stem (PECL)
PHP Version: Irrelevant OS: Fedora Core #5
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: bolk at lixil dot ru
New email:
PHP Version: OS:

 

 [2007-09-13 10:20 UTC] bolk at lixil dot ru
Description:
------------
If LANG enviroment variable set in en_US stem_russain doesn't works properly. Than I set it in en_US.UTF8 it works fine.

Reproduce code:
---------------
echo stem_russian('russian word in KOI-8R');

Expected result:
----------------
cutted word

Actual result:
--------------
word is corrupted

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-09-13 21:28 UTC] jay@php.net
I can't reproduce this on my system (FC6). Can you provide 
a short script to reproduce the problem?

Also, does the stem_russian_unicode() function behave 
similarly?
 [2007-09-14 02:44 UTC] bolk at lixil dot ru
Try to run:

LANG=en_US

php -r '
	$word = "Форумов";

	file_put_contents("a.txt", stem_russian($word));
'

"Форумов" must be in KOI-8R.

In file "a.txt" I see "ФОРУМОв" instead of "Форум".

What is encoding use stem_russian_unicode? USC-2BE?
 [2007-09-14 02:46 UTC] bolk at lixil dot ru
Oops.. Russian letters were encoded... Try to copy my post to HTML file and open it in a browser.
 [2007-09-20 21:28 UTC] jay@php.net
The encoding for the unicode stemmers is always UTF8, while the others are ISO-8859-1, with the exception of the plain Russian stemmer which uses KOI8-R. We don't do any special conversions or anything with character encodings when we pass things through to the Snowball stemmers. The extension is a pretty straight forward wrapper around Snowball.

To avoid having the characters mangled by entity encoding, could you please email some test scripts that reproduce the problem directly to me at jay at php dot net, and I'll take a look at them when I get a chance. Thanks!
 [2007-09-21 04:39 UTC] bolk at lixil dot ru
ok, check your mailbox
 [2008-04-03 07:52 UTC] ohill@php.net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Nov 12 09:01:28 2024 UTC