php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #55774 Array index limitation
Submitted: 2011-09-24 18:41 UTC Modified: 2015-01-18 04:22 UTC
From: inge at upandforward dot com Assigned:
Status: No Feedback Package: Scripting Engine problem
PHP Version: 5.3.8 OS: php 5.3.5-1ubuntu7.2
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: inge at upandforward dot com
New email:
PHP Version: OS:

 

 [2011-09-24 18:41 UTC] inge at upandforward dot com
Description:
------------
Array indexes, although they can be strings, are not allowed to contain national characters, even if encoded as UTF-8.
Thus, a string like "øvinger" becomes "vinger", and "Æsop" becomes "sop".
This is very unfortunate.

My example builds an array of "name only" indexes, all in lower case.
Each entry of the array contains the complete filename of the corresponding file.
This serves as a fast and relatively safe method to find the correct path to a file, regardless of case.

I tested using, among others, a file named "php/Øvinger.php".
DS is Directory Separator (/),
$filetypes is an array containing file types to search for (like "php")
lowercase and name_only should be self-explanatory.

Test script:
---------------
//	First time only: Find all php files in
//	the 'php', 'inc' and 'txt' directories.

if (!isset ($_SESSION['long']))
{	$_SESSION['long'] = array();
	foreach ($filetypes as $dir)
	{	$files = glob ($dir.DS."*.$dir");
// Save all info for each file. This means that we won't have to
		foreach ($files as $file)		// search any more.
		{	$name = lowercase(name_only ($file));
			$_SESSION['long'][$name] = $file;
		}
	}
}


Expected result:
----------------
$_SESSION['long']['øvinger'] contains "php/Øvinger"




Actual result:
--------------
$_SESSION['long']['vinger'] contains "php/Øvinger"


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-09-24 18:58 UTC] aharvey@php.net
-Status: Open +Status: Feedback -Type: Feature/Change Request +Type: Bug -Package: *General Issues +Package: Scripting Engine problem
 [2011-09-24 18:58 UTC] aharvey@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

I can't reproduce this at all in a standalone script:
http://codepad.viper-7.com/pZswtF shows an example of a UTF-8 encoded
array key being set properly. (I wrote another test that persists a
similar array key across multiple pages via $_SESSION, and that worked
as expected too.)
 [2011-09-25 07:41 UTC] inge at upandforward dot com
-Status: Feedback +Status: Open
 [2011-09-25 07:41 UTC] inge at upandforward dot com
You are right. This was a case of jumping to conclusions.
The value used as a key was already wrong, and I should have checked that.
The problem is associated with the function basename, but I have also not been able to reproduce this in a standalone script, so it must be a side-effect from something else. I need to investigate further.

Sorry to have bothered you. I really thought I had done enough testing! :)
 [2011-09-25 08:37 UTC] inge at upandforward dot com
I think I have found the problem, but no solution. The following test script functions perfectly when run stand-alone, but "basename" fails when run from Apache 2.0.
Are you able to reproduce the error?

<?php
define ('PT' , '.');					// (Decimal) point
define ('LF' , PHP_EOL);				// File newline

$charset = 'utf-8';						// NB! Select charset here!
define ('CHARSET',$charset);			// NB! Define it as a constant.
setlocale(LC_ALL, 'nb_NO');

mb_internal_encoding (CHARSET);
putenv ("LANG=nb_NO.".CHARSET);

$filenames = array ("php/Treffer.php","php/Øvinger.php","php/Æsop.php");
$destinations = array();
// Save all info for each file. This means that we won't have to
foreach ($filenames as $file)		// search any more.
{	$key = basename($file,".php");
	$destinations[$key] = $file;
}
echo "After: ".print_r($destinations);
?>
 [2011-09-25 09:00 UTC] inge at upandforward dot com
For now I have made a work-around which works in all cases, using this function:

function name_only($file)
{	$from[] = '.'.substr(strrchr($file,'.'),1);
	$from[] = dirname($file).'/';		// Remove extension and path.
	return str_replace ($from,'',$file);
}
 [2011-09-25 09:05 UTC] inge at upandforward dot com
The original code, using "basename", only fails on my local server, NOT when executed on my webhost.

That should conclude that this is not a PHP bug?
 [2015-01-05 21:48 UTC] danack@php.net
-Status: Open +Status: Feedback
 [2015-01-05 21:48 UTC] danack@php.net
Hi Inge,

It does mean that there is something different between your local server and the expected behaviour. If this is still occuring, please can you provide some information about what the difference is between your local server and webserver.
 [2015-01-18 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Oct 06 20:01:27 2024 UTC