php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #53350 Cannot use paths that contain certain SJIS chars
Submitted: 2010-11-19 04:44 UTC Modified: 2010-11-19 05:03 UTC
From: php at madlon-kay dot com Assigned:
Status: Duplicate Package: Filesystem function related
PHP Version: 5.3.3 OS: Windows XP SP3 Japanese
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
50 - 48 = ?
Subscribe to this entry?

 
 [2010-11-19 04:44 UTC] php at madlon-kay dot com
Description:
------------
Paths in Japanese WinXP are all handled in Shift-JIS encoding ('SJIS-win' for mb_convert_encoding(), etc.). Shift-JIS contains a number of commonly-used characters for which the second byte is 0x5C (e.g. 表, 0x955C). 0x5C happens to be the encoding for backslash \, the escape character.

When one of these characters is contained within a path, a large number of filesystem-related functions will fail to interpret the path correctly, and will simply not work, or will do weird things such as create or read unrequested files (see the provided test script for details).

Test script:
---------------
<?php

// Assume file '表.txt' exists in the cwd

mb_internal_encoding('SJIS');

echo filesize('表.txt')); // This works ok

echo file_get_contents('表.txt'); // This fails: 'failed to open stream: No such file or directory'
// Furthermore, if "表表.txt" exists then its contents will be shown by the previous command (!?!?)

file_put_contents('表.txt', 'blahblahblah'); // A new file '表表.txt' is created (!?!?)

?>

Expected result:
----------------
file_get_contents() should read the correct file and not fail.
file_put_contents() should not create unrequested files.

Actual result:
--------------
file_get_contents(), is_file(), etc. fail to correctly interpret any path containing 表 and other SJIS chars with a second byte of 0x5C.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-11-19 05:03 UTC] pajoye@php.net
-Status: Open +Status: Duplicate
 [2010-11-19 05:03 UTC] pajoye@php.net
We already have a feature request for unicode support on Windows, for the file system functions.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 23:01:34 2024 UTC