php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #50189 [PATCH] - unicode byte order difference between SPARC and x86
Submitted: 2009-11-16 12:20 UTC Modified: 2011-03-08 18:39 UTC
From: yoarvi at gmail dot com Assigned:
Status: Not a bug Package: Unicode Engine related
PHP Version: 6SVN-2009-11-16 (SVN) OS: Solaris 10 (SPARC)
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
49 + 28 = ?
Subscribe to this entry?

 
 [2009-11-16 12:20 UTC] yoarvi at gmail dot com
Description:
------------
zspprintf() incorrectly represents strings/chars as unicode characters on Solaris (SPARC).

There are byte ordering differences for unicode representations between x86 and SPARC:

For example, the unicode representation (i've grouped them in sets of 2chars) of '/tmp' on x86 is
'/''\0' 't''\0' 'm''\0' 'p''\0'

and on SPARC it is
'\0''/' '\0''t' '\0''m' '\0''p'

http://marc.info/?l=php-internals&m=125811990106419&w=2 has some more details.

the problem seems to be in the smart_str_append2c macro that zspprintf()/xbuf_format_converter end up using.

The following patch fixes the problem:
Index: ext/standard/php_smart_str.h
===================================================================
--- ext/standard/php_smart_str.h        (revision 290471)
+++ ext/standard/php_smart_str.h        (working copy)
@@ -86,10 +86,17 @@
        smart_str_appendc_ex((dest), (c), 0)

 /* appending of a single UTF-16 code unit (2 byte)*/
+#if (defined(i386) || defined(__i386__) || defined(_X86_))
 #define smart_str_append2c(dest, c) do {       \
        smart_str_appendc_ex((dest), (c&0xFF), 0);      \
        smart_str_appendc_ex((dest), (c&0xFF00 ? c>>8 : '\0'), 0);      \
 } while (0)
+#else
+#define smart_str_append2c(dest, c) do {       \
+       smart_str_appendc_ex((dest), (c&0xFF00 ? c>>8 : '\0'), 0);      \
+       smart_str_appendc_ex((dest), (c&0xFF), 0);      \
+} while (0)
+#endif

 #define smart_str_free(s) \
        smart_str_free_ex((s), 0)



Reproduce code:
---------------
% sapi/cli/php ext/spl/tests/DirectoryIterator_getBasename_basic_test.php

Expected result:
----------------
getBasename_test

Actual result:
--------------
php goes into an infinite loop

Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-11-16 13:07 UTC] tokul at users dot sourceforge dot net
If is not "#if (defined(i386) || defined(__i386__) || defined(_X86_))
" vs others.

It is little endian vs big endian. I suspect that code should not assume that all other archs are big endian.
 [2009-11-16 16:35 UTC] yoarvi at gmail dot com
ext/sqlite3/libsqlite/sqlite3.c uses
#if defined(i386) || defined(__i386__) || defined(_M_IX86)\
                             || defined(__x86_64) || defined(__x86_64__)


Is that better?
 [2009-11-17 10:51 UTC] yoarvi at gmail dot com
Updated patch using WORDS_BIGENDIAN (suggested by christopher dot jones at oracle dot com)

Index: ext/standard/php_smart_str.h
===================================================================
--- ext/standard/php_smart_str.h        (revision 290471)
+++ ext/standard/php_smart_str.h        (working copy)
@@ -86,10 +86,17 @@
        smart_str_appendc_ex((dest), (c), 0)

 /* appending of a single UTF-16 code unit (2 byte)*/
+#ifndef WORDS_BIGENDIAN
 #define smart_str_append2c(dest, c) do {       \
        smart_str_appendc_ex((dest), (c&0xFF), 0);      \
        smart_str_appendc_ex((dest), (c&0xFF00 ? c>>8 : '\0'), 0);      \
 } while (0)
+#else
+#define smart_str_append2c(dest, c) do {       \
+       smart_str_appendc_ex((dest), (c&0xFF00 ? c>>8 : '\0'), 0);      \
+       smart_str_appendc_ex((dest), (c&0xFF), 0);      \
+} while (0)
+#endif

 #define smart_str_free(s) \
        smart_str_free_ex((s), 0)
 [2010-12-17 14:33 UTC] jani@php.net
-Package: Unicode Function Upgrades relate +Package: Unicode Engine related
 [2011-03-08 18:39 UTC] felipe@php.net
-Status: Open +Status: Bogus
 [2011-03-08 18:39 UTC] felipe@php.net
php6 code is dead.
 [2019-07-28 08:05 UTC] chen dot yufeng at m2k dot com dot tw
The following pull request has been associated:

Patch Name: Fix $revisions number
On GitHub:  https://github.com/php/doc-en/pull/6
Patch:      https://github.com/php/doc-en/pull/6.patch
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 20:01:29 2024 UTC