php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47481 strnatcmp: char instead of unsigned char
Submitted: 2009-02-23 13:40 UTC Modified: 2009-07-16 16:21 UTC
Votes:2
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:2 (100.0%)
From: carsten_sttgt at gmx dot de Assigned:
Status: Closed Package: Strings related
PHP Version: 5.2.8 OS: *
Private report: No CVE-ID: None
 [2009-02-23 13:40 UTC] carsten_sttgt at gmx dot de
Description:
------------
Hello,

why is the nat_char defined as char instead of unsigned char?

char limit us (and a correct sorting) to ASCII 0-127. With a unsigned char (ASCII 0-255) the sorting is "correct" for all single byte charsets like iso-8850-1 (which is the default in PHP).

Internally the function is already doing a cast to unsigned char many times (but not in the main comparison).

In the original header (strnatcmp.h) from the author, the typedef for nat_char is also only a hint.

Regards,
Carsten


Reproduce code:
---------------
<?php
$daten = array('S?den','spielen','Sonne','Wind','Regen','Meer');
natcasesort($daten);
print_r($daten);
?>


Expected result:
----------------
Array
(
    [5] => Meer
    [4] => Regen
    [2] => Sonne
    [1] => spielen
    [0] => S?den
    [3] => Wind
)


Actual result:
--------------
Array
(
    [5] => Meer
    [4] => Regen
    [0] => S?den
    [2] => Sonne
    [1] => spielen
    [3] => Wind
)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-03-03 05:06 UTC] hradtke@php.net
The strnatcmp uses the zend_parse_paramters function to parse the function parameters.  The zend_parse_parameters function converts the string from php space into a char.  Seeing as how this is a core function, I doubt this will be fixed soon.

I may be completely off base, so I will leave this bug open in case someone else wants to comment.
 [2009-03-10 11:13 UTC] carsten_sttgt at gmx dot de
> The strnatcmp uses the zend_parse_paramters function to parse
> the function parameters.

Ah, ok, I_m not familiar with the PHP/Zend internals (or C...).

Just a question about the difference between natsort() and asort(). Should they not work in the same way if you have an array without numbers in the key values? And if I look into array.c, PHP_FUNCTION(asort) is also using zend_parse_parameters.

e.g. IMHO this script should result in 2 times the same output:
<?php
$datensort = $datennat = $daten = array(
    'S?den','spielen','Sonne','Wind','Regen','Meer'
);

natsort($datennat);
print_r($datennat);

asort($datensort);
print_r($datensort);

?>

Regards,
Carsten
 [2009-03-31 03:25 UTC] hradtke@php.net
Hi Carsten,

I have no idea why I thought zend_parse_paramters was a problem.  

I see no reason why strnatcmp_ex() couldn't use unsigned char's rather than a normal char.  I suspect the type casting is done to make sure the character is properly promoted for the is*() calls.

Test case - http://www.hermanradtke.com/patches/bug47481.phpt
Patch - http://www.hermanradtke.com/patches/php-47481-natcasesort-extended-ascii.patch


 [2009-07-16 16:21 UTC] jani@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 05:01:29 2024 UTC