php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47481 strnatcmp: char instead of unsigned char
Submitted: 2009-02-23 13:40 UTC Modified: 2009-07-16 16:21 UTC
Votes:2
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:2 (100.0%)
From: carsten_sttgt at gmx dot de Assigned:
Status: Closed Package: Strings related
PHP Version: 5.2.8 OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: carsten_sttgt at gmx dot de
New email:
PHP Version: OS:

 

 [2009-02-23 13:40 UTC] carsten_sttgt at gmx dot de
Description:
------------
Hello,

why is the nat_char defined as char instead of unsigned char?

char limit us (and a correct sorting) to ASCII 0-127. With a unsigned char (ASCII 0-255) the sorting is "correct" for all single byte charsets like iso-8850-1 (which is the default in PHP).

Internally the function is already doing a cast to unsigned char many times (but not in the main comparison).

In the original header (strnatcmp.h) from the author, the typedef for nat_char is also only a hint.

Regards,
Carsten


Reproduce code:
---------------
<?php
$daten = array('S?den','spielen','Sonne','Wind','Regen','Meer');
natcasesort($daten);
print_r($daten);
?>


Expected result:
----------------
Array
(
    [5] => Meer
    [4] => Regen
    [2] => Sonne
    [1] => spielen
    [0] => S?den
    [3] => Wind
)


Actual result:
--------------
Array
(
    [5] => Meer
    [4] => Regen
    [0] => S?den
    [2] => Sonne
    [1] => spielen
    [3] => Wind
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-03-03 05:06 UTC] hradtke@php.net
The strnatcmp uses the zend_parse_paramters function to parse the function parameters.  The zend_parse_parameters function converts the string from php space into a char.  Seeing as how this is a core function, I doubt this will be fixed soon.

I may be completely off base, so I will leave this bug open in case someone else wants to comment.
 [2009-03-10 11:13 UTC] carsten_sttgt at gmx dot de
> The strnatcmp uses the zend_parse_paramters function to parse
> the function parameters.

Ah, ok, I_m not familiar with the PHP/Zend internals (or C...).

Just a question about the difference between natsort() and asort(). Should they not work in the same way if you have an array without numbers in the key values? And if I look into array.c, PHP_FUNCTION(asort) is also using zend_parse_parameters.

e.g. IMHO this script should result in 2 times the same output:
<?php
$datensort = $datennat = $daten = array(
    'S?den','spielen','Sonne','Wind','Regen','Meer'
);

natsort($datennat);
print_r($datennat);

asort($datensort);
print_r($datensort);

?>

Regards,
Carsten
 [2009-03-31 03:25 UTC] hradtke@php.net
Hi Carsten,

I have no idea why I thought zend_parse_paramters was a problem.  

I see no reason why strnatcmp_ex() couldn't use unsigned char's rather than a normal char.  I suspect the type casting is done to make sure the character is properly promoted for the is*() calls.

Test case - http://www.hermanradtke.com/patches/bug47481.phpt
Patch - http://www.hermanradtke.com/patches/php-47481-natcasesort-extended-ascii.patch


 [2009-07-16 16:21 UTC] jani@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 13:01:29 2024 UTC