php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #17154 Trailing garbage
Submitted: 2002-05-11 10:33 UTC Modified: 2004-10-27 18:24 UTC
Votes:10
Avg. Score:5.0 ± 0.0
Reproduced:9 of 9 (100.0%)
Same Version:3 (33.3%)
Same OS:4 (44.4%)
From: k dot joe at freemail dot hu Assigned:
Status: Not a bug Package: Recode related
PHP Version: 4.3.3-dev OS: Linux2.2.19/Debian
Private report: No CVE-ID: None
 [2002-05-11 10:33 UTC] k dot joe at freemail dot hu
Recode function somehow fails to calculate length of the result string, this cause (mostly) random segfaults. In this example, the FOR will stop at different cyclcount, which count depends on running mode: apache module, cgi from shell, cgi from gdb, and the operations on the string before calling recode.

The result of recoding in the file is so weird, at several places the two string's length doesn't equal (like some buffer owerflow problem.) PHP versions 4.0.6-4.1.2 (with recode 3.6) are all affected (commandline recode is works well).

<?
  $fp = fopen("ideni","w");

  for ($i = 0; $i < 10240; $i++)
  {
    echo "$i\n";
    $str = str_repeat("a",$i);

    if (strlen($str) !=
        strlen(recode("utf8..latin2",$str)))
    {
      $fstr = "\n$i: $str";
      $rstr = "\n$i: " . recode("utf8..latin2",$str);

      fwrite($fp,$fstr);
      fwrite($fp,$rstr);
    }
  }

  fclose($fp);
?>

This backtrace made from cgi/gdb:

#0  0x4024ed28 in free () from /lib/libc.so.6
#1  0x4024ea0a in malloc () from /lib/libc.so.6
#2  0x4024e1e4 in malloc () from /lib/libc.so.6
#3  0x080f5a8f in _emalloc (size=6828, __zend_filename=0x81309c2 "recode.c", __zend_lineno=142, __zend_orig_filename=0x0,
    __zend_orig_lineno=0) at zend_alloc.c:165
#4  0x080f61ed in _estrndup (s=0x81d64a8 'a' <repeats 200 times>..., length=6827, __zend_filename=0x81309c2 "recode.c", __zend_lineno=142,
    __zend_orig_filename=0x0, __zend_orig_lineno=0) at zend_alloc.c:356
#5  0x0807d88a in zif_recode_string (ht=2, return_value=0x81d2384, this_ptr=0x0, return_value_used=1) at recode.c:142
#6  0x0812594a in execute (op_array=0x81cddbc) at ./zend_execute.c:1590
#7  0x08107309 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at zend.c:814
#8  0x0805f411 in php_execute_script (primary_file=0xbffffd04) at main.c:1307
#9  0x0805cc8c in main (argc=3, argv=0xbffffd94) at cgi_main.c:738
#10 0x401f96cf in __libc_start_main () from /lib/libc.so.6
(gdb) frame 6
#6  0x0812594a in execute (op_array=0x81cddbc) at ./zend_execute.c:1590
1590                                                    ((zend_internal_function *) function_state.function)->handler(opline->extended_value,
Ts[opline->result.u.var].var.ptr, object.ptr, return_value_used TSRMLS_CC);

Good luck!

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-06-04 04:25 UTC] mfischer@php.net
Thank you for taking the time to report a problem with PHP.
Unfortunately your version of PHP is too old -- the problem
might already be fixed. Please download a new PHP
version from http://www.php.net/downloads.php

If you are able to reproduce the bug with one of the latest
versions of PHP, please change the PHP version on this bug report
to the version you tested and change the status back to "Open".
Again, thank you for your continued support of PHP.


 [2002-06-06 18:03 UTC] k dot joe at freemail dot hu
Tests with PHP4.3.0-dev and PHP4.2.1 get same (wrong) result. The recoded string's length and the original stringlength are not equal. Simply try to recode a 36 chr long string will results a 40 byte long string, so the return value contains additional 4 byte 0x00 chr garbage at the end:

recode ("utf8..latin2", "0123456789012345678901234567890123456");

The error is reproducable at several stringlength: 36-39, 96-99, 186-189, 321-324, 523-526, 826-829, 1281-1284, 1963-1966, 2986-2989, 4521-4524, 6823-6826, and so on...:)
(operation on the result string makes random crashes).

Please try the examples above and report if it's working correctly. (sorry, if the previous description was  confusing ;)

Thx
 [2002-06-24 12:08 UTC] chregu@php.net
Same problem here. same string lengths, which cause errors.

recode on the commandline does it perfectly right.
php 4.2 did add trailing garbage
php 4.3-dev segfaults

chregu
 [2002-06-24 12:10 UTC] chregu@php.net
not exactly true what i said:

4.3.0-dev does not always segfault (mostly with a string-length of 96...) and it seems to behave like 4.2

chregu 
 [2002-09-18 17:42 UTC] luka at mail dot ljudmila dot org
this bug is for real!
just stumbled into it while writing a mail script.
recode _does_ stubbornly add somewhat random trailing garbage to strings on my system. i made a test script to figure it out, so i might as well post it here. 

my php is 4.2.3, system is Debian. i also got some segfaults from my mail script, but this was rare and might or might not be connected to the trailing garbage bug



sample output first (wrong, clearly):

SNIP>
bash-2.05b$ php4 recodetest.php
X-Powered-By: PHP/4.2.3
Content-type: text/html

testing recode request ISO-8859-1..UTF-8

INPUT: "Some Hacker <user@host.ljudmila.org>"
OUTPUT:
"Some Hacker <user@host.ljudmila.org>"
"Some Hacker <user@host.ljudmila.org>"
"Some Hacker <user@host.ljudmila.org>&"
"Some Hacker <user@host.ljudmila.org>"
"Some Hacker <user@host.ljudmila.org>"
"Some Hacker <user@host.ljudmila.org>"
"Some Hacker <user@host.ljudmila.org>@"
"Some Hacker <user@host.ljudmila.org"
"Some Hacker <user@host.ljudmila.org>0u"

INPUT: "Some Hacker <user@host.ljudmila.org "
OUTPUT:
"Some Hacker <user@host.ljudmila.org 0u"

INPUT: "Some Hacker  user@host.ljudmila.org>"
OUTPUT:
"Some Hacker  user@host.ljudmila.org>0u"

INPUT: "Some Hacker  <user@host.ljudmila.org>"
OUTPUT:
"Some Hacker  <user@host.ljudmila.org>u"

INPUT: "Some Hacker  <user@host.ljudmila.org "
OUTPUT:
"Some Hacker  <user@host.ljudmila.org u"

INPUT: "Some Hacker   user@host.ljudmila.org>"
OUTPUT:
"Some Hacker   user@host.ljudmila.org>u"

INPUT: "Some Hacker <user@host.ljudmila.org>  "
OUTPUT:
"Some Hacker <user@host.ljudmila.org>  "

INPUT: "Some Hacker <user@host.ljudmila.org   "
OUTPUT:
"Some Hacker <user@host.ljudmila.org   "

INPUT: "Some Hacker  user@host.ljudmila.org>  "
OUTPUT:
"Some Hacker  user@host.ljudmila.org>  "

INPUT: "&#65533;  B "
OUTPUT:
"&#65533;&#65533;  B "

INPUT: "MAKE MONEY REALLY REALLY REALLY FAST"
OUTPUT:
"MAKE MONEY REALLY REALLY REALLY FASTY"
"MAKE MONEY REALLY REALLY REALLY FAST"


Tried 200 loops on 11 test(s).

<SNIP

and the code, so you can try too!

<?php

#try different encodings


$from='ISO-8859-1';
#$from='ascii';

$to='UTF-8'; 
#$to='HTML';
#$to='flat';

echo "testing recode request $from..$to\n";

$tests=array
(
 'Some Hacker <user@host.ljudmila.org>',
 'Some Hacker <user@host.ljudmila.org ',
 'Some Hacker  user@host.ljudmila.org>',

 'Some Hacker  <user@host.ljudmila.org>',
 'Some Hacker  <user@host.ljudmila.org ',
 'Some Hacker   user@host.ljudmila.org>',

 'Some Hacker <user@host.ljudmila.org>  ',
 'Some Hacker <user@host.ljudmila.org   ',
 'Some Hacker  user@host.ljudmila.org>  ',

 "\xA0 \x10 \x42 \x00",
 'MAKE MONEY REALLY REALLY REALLY FAST',
);


$tries=200;

foreach ($tests as $t) {

  print "\nINPUT: \"$t\"\nOUTPUT:\n";
  for ($i=0;$i<$tries;$i++) {
    $output=recode("$from..$to",$t);
    if ($output!=$old) {
      print "\"$output\"\n";
      $old=$output;
    }
  }
}

echo "\n\nTried $tries loops on ".sizeof($tests)." test(s).\n";
?>

hopefully this will give someone a chance to test on latest sources, or at least a clue about the cause of the bug
 [2003-07-07 07:52 UTC] derick@php.net
I had a look at this, but it really looks correct from the PHP side. For some reason the recode library returns a string that is too long with random chars behind it. It's not a bug in PHP, everything is done as the documentation of recode tells it should be done. I used recode 3.6 for my tests and it definitely doesn't behave as it should.
 [2004-10-27 18:24 UTC] skettler@php.net
This is definately a bug in recode-3.6. Please see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=156635 for a patch against recode-3.6.

Maybe we should check for this bug when configuring PHP --with recode.

Debian maintainers have also renamed internal symbols that conflicted with imap and mysql (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=131080), so it might be wise to explicitly check against those symbols before denying configuration with PHP 5 aswell.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 05:01:29 2024 UTC