php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #53350 Cannot use paths that contain certain SJIS chars
Submitted: 2010-11-19 04:44 UTC Modified: 2010-11-19 05:03 UTC
From: php at madlon-kay dot com Assigned:
Status: Duplicate Package: Filesystem function related
PHP Version: 5.3.3 OS: Windows XP SP3 Japanese
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: php at madlon-kay dot com
New email:
PHP Version: OS:

 

 [2010-11-19 04:44 UTC] php at madlon-kay dot com
Description:
------------
Paths in Japanese WinXP are all handled in Shift-JIS encoding ('SJIS-win' for mb_convert_encoding(), etc.). Shift-JIS contains a number of commonly-used characters for which the second byte is 0x5C (e.g. 表, 0x955C). 0x5C happens to be the encoding for backslash \, the escape character.

When one of these characters is contained within a path, a large number of filesystem-related functions will fail to interpret the path correctly, and will simply not work, or will do weird things such as create or read unrequested files (see the provided test script for details).

Test script:
---------------
<?php

// Assume file '表.txt' exists in the cwd

mb_internal_encoding('SJIS');

echo filesize('表.txt')); // This works ok

echo file_get_contents('表.txt'); // This fails: 'failed to open stream: No such file or directory'
// Furthermore, if "表表.txt" exists then its contents will be shown by the previous command (!?!?)

file_put_contents('表.txt', 'blahblahblah'); // A new file '表表.txt' is created (!?!?)

?>

Expected result:
----------------
file_get_contents() should read the correct file and not fail.
file_put_contents() should not create unrequested files.

Actual result:
--------------
file_get_contents(), is_file(), etc. fail to correctly interpret any path containing 表 and other SJIS chars with a second byte of 0x5C.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-11-19 05:03 UTC] pajoye@php.net
-Status: Open +Status: Duplicate
 [2010-11-19 05:03 UTC] pajoye@php.net
We already have a feature request for unicode support on Windows, for the file system functions.
 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Sun Nov 19 01:31:42 2017 UTC