|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2013-06-19 09:55 UTC] jithin1987 at gmail dot com
Description:
------------
When locale is not en_US and when input array's indices are not integers in order
then collator_sort/asort is not producing the correct results.
Test script:
---------------
<?php
$coll = collator_create("ru_RU");
$array = array( "n", "", "d", "nn" );
print_r("Input \n");
print_r($array);
$res_val = collator_asort( $coll, $array);
print_r("Output\n");
print_r($array);
$array2 = array(
'1' => 'п',
'4' => '',
'7' => 'd',
'2' => 'пп' );
print_r("Input\n");
print_r($array2);
$res_val = collator_asort( $coll, $array2);
print_r("Output\n");
print_r($array2);
?>
Expected result:
----------------
I would have expected both output arrays to be sorted similarly. Precisely output
2 from above test script to be.
Output 2
Array
(
[4] =>
[7] => d
[1] => п
[2] => пп
)
Actual result:
--------------
Output 2
Array
(
[4] =>
[1] => п
[2] => пп
[7] => d
)
On running it under GDB I found something weird happening with values being
compared.
backtrace
Breakpoint 3, collator_regular_compare_function (result=0x7fffffffa1a0,
op1=0xa3a978, op2=0xa3ae50) at /home/jithine/sources/libs/icu/branches/yphp_intl-
3/source/collator/collator_sort.c:89
89 result->type = IS_LONG;
(gdb) print (*result).value
$96 = {lval = 1, dval = 4.9406564584124654e-324, str = {val = 0x1 <Address 0x1 out
of bounds>, len = 0}, ht = 0x1, obj = {handle = 1, handlers = 0x0}}
(gdb) print *str1
$97 = {value = {lval = 10726312, dval = 5.2995022657747129e-317, str = {val =
0xa3aba8 "d", len = 2}, ht = 0xa3aba8, obj = {handle = 10726312, handlers = 0x2}},
refcount__gc = 3, type = 6 '\006',
is_ref__gc = 0 '\000'}
(gdb) print *str2
$98 = {value = {lval = 10726960, dval = 5.299822420313218e-317, str = {val =
0xa3ae30 "?\004", len = 2}, ht = 0xa3ae30, obj = {handle = 10726960, handlers =
0x2}}, refcount__gc = 2, type = 6 '\006',
is_ref__gc = 0 '\000'}
(gdb) c
Here value of *str2 should have been 'n' but its showing as "?\004" and the above
comparison operation should have identified 'd' as lesser value, but the opposite
is happening.
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Wed Oct 29 07:00:01 2025 UTC |
I was debugging deeper and found something weirder <?php $coll = collator_create("ru_RU"); $array2 = array( '1' => 'n', '4' => '', '7' => 'd', '8' => 'f', '9' => 'e', '2' => 'nn' ); $res_val = collator_asort( $coll, $array2); print_r("Output 1\n"); print_r($array2); $array2 = array( '1' => 'п', '4' => '', '7' => 'd', '2' => 'пп' ); $res_val = collator_asort( $coll, $array2); print_r("Output 2\n"); print_r($array2); ?> In the above test program 1st array gets sorted correctly while second array gets sorted incorrectly. On checking the value of array getting sorted inside collator_sort.c:collator_sort_internal() I find that once collator_convert_hash_from_utf8_to_utf16() is called the value on n & nn is getting corrupted in case of second array. Value of nn before collator_convert_hash_from_utf8_to_utf16() Breakpoint 3, collator_sort_internal (renumber=0, ht=2, return_value=0x7ffff7fbfde0, return_value_ptr=0x0, this_ptr=0x0, return_value_used=1) at /home/jithine/sources/libs/icu/branches/yphp_intl- 3/source/collator/collator_sort.c:318 318 collator_convert_hash_from_utf8_to_utf16( hash, COLLATOR_ERROR_CODE_P( co ) ); (gdb) print **(zval **)(*(Bucket *)hash.arBuckets[2]).pData $78 = {value = {lval = 140737352781048, dval = 6.9533491095755836e-310, str = {val = 0x7ffff7eb4cf8 "пп", len = 4}, ht = 0x7ffff7eb4cf8, obj = {handle = 4159393016, handlers = 0x4}}, refcount__gc = 1, type = 6 '\006', is_ref__gc = 0 '\000'} Value of nn after collator_convert_hash_from_utf8_to_utf16() (gdb) n 319 COLLATOR_CHECK_STATUS( co, "Error converting hash from UTF-8 to UTF-16" ); (gdb) print **(zval **)(*(Bucket *)hash.arBuckets[2]).pData $79 = {value = {lval = 140737353877760, dval = 6.9533491637603558e-310, str = {val = 0x7ffff7fc0900 "?\004?\004", len = 4}, ht = 0x7ffff7fc0900, obj = {handle = 4160489728, handlers = 0x7fff00000004}}, refcount__gc = 1, type = 6 '\006', is_ref__gc = 0 '\000'} Any idea what is happening?