|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2000-09-21 22:55 UTC] swenson at heronetwork dot com
On current build I am trying to parse an external tab and
space separated file (mime types). All of the tokenizing functions appear to not be performing the regex parsing correctly. All of these functions are performing very odd. Most interesting is that the first item is not even getting into the arrays (or token lists).
Example input:
# Simple mime file
text/html htm HTM html HTML shtml SHTML
text/plain java JAVA c C cc CC cpp CPP h H txt TXT
text/rtf rtf RTF
Script:
<?php
$mimeFile="/tmp/mime.txt";
function get_mime_type ( $ext ) {
global $mimeFile;
if( !$ext || strlen(trim($ext)) == 0 ) {
return "application/binary";
}
$fp = fopen ($mimeFile, "r");
while(!feof($fp)) {
$next = fgetss($fp, 300);
$next = trim($next);
echo("<p>Line in: $next<br>");
if( !$next ) continue;
// try it with [\t\s] or [:space:] or " " etc.
// it just has problems, same with split and strtok $mime = explode("[ ]",$next);
if( substr($mime[0], 0, 1) == "#" ) continue;
$len = count($mime);
echo("Line has: $len tokens - ");
for( $x = 1 ; $x <= $len ; $x++ ) {
if( !$mime[$x] ) continue;
if( substr($mime[$x] ,0 , 1) == "#" ) break;
echo("$x - $mime[$x], ");
if( $ext == $mime[$x] ) return $mime[0];
}
}
return "application/binary";
}
echo("<br><H1>Mime type for .H is " . get_mime_type(".H") . "</H1>");
?>
Sample output:
Line in: # Simple mime file
Line in:
Line in: text/html htm HTM html HTML shtml SHTML
Line has: 6 tokens - 1 - HTM, 2 - html, 3 - HTML, 4 - shtml, 5 - SHTML,
Line in: text/plain java JAVA c C cc CC cpp CPP h H txt TXT
Line has: 12 tokens - 1 - JAVA, 2 - c, 3 - C, 4 - cc, 5 - CC, 6 - cpp, 7 - CPP, 8 - h, 9 - H, 10 - txt, 11 - TXT,
Line in: text/rtf rtf RTF
Line has: 2 tokens - 1 - RTF,
Line in:
Mime type for .H is application/binary
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Wed Nov 12 09:00:01 2025 UTC |
Sorry I spotted a bug in the sample call. It should be: echo("<br><H1>Mime type for .H is " . get_mime_type("H") . "</H1>"); The real problem here is that if you look at one sample input: "text/html htm HTM html HTML shtml SHTML" Which in regular expression land could look like this: text/html\t\shtm\sHTM\shtml\sHTML\sshtml\sSHTML\n when parsed with explode("[\t\s]") or the same without the regex escapes (as in the current example script) an array of 7 elements should be created: 0=>"text/html" 1=> "htm" 2=> "HTM" 3=> "html" 4=> "HTML" 5=> "shtml" 6=> "SHMTL" But instead (as shown in the sample output) an array of 6 elements is being created: 0=> "htm" 1=> "HTM" 2=> "html" 3=> "HTML" 4=> "shtml" 5=> "SHMTL" Additionally ALL of the regular expresion character classes [:space:] and escapes '\t\n\s' are not functioning correctly. I have tried rebuilding php with perl regex, system and php regex support without any of them working. Having written a few regex engines myself I know how hard it is to get it right. But this is really a huge bug in explode and strtok.I'm a little confused by your examples. First, explode does not accept regular expression argument. It requires string separator argument. Second, split, which does expect regular expression, does not know \s escape - that's Perl escape, and you should use preg_split for it or use [:space:] character class. When I tried split("[\t ]",...) or preg_split("/[\t\s]/", ...) it worked flawlessly for me. Please provide example of not working code along the following lines: ===example start <? $str = "MY String"; $array = split("pattern",$str); var_dump($array); ?> Should be: [0]=>"MY",[1]=>"string", but prints [0]=>"Y s",[1]=>"tring" ===example endOK, sorry for giving too large a sample program. Try this: <? $str = " MY String is hosed "; $array = split("[ \t]*",$str); $len = count($array); echo("String has: $len tokens\n"); for( $x = 0 ; $x < $len ; $x++ ) { print " \$array[$x] => \"$array[$x]\"\n"; } ?> It returns this: Warning: bad regular expression for split() in split.php on line 3 String has: 1 tokens $array[0] => "" When it should return: String has: 4 tokens $array[0] => "MY" $array[1] => "String" $array[2] => "is" $array[3] => "hosed" Replacing split("[ \t]*",$str); with split("[[:space:]]*",$str); returns the same error.