|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76950 trim not working
Submitted: 2018-09-29 20:36 UTC Modified: 2018-10-11 11:41 UTC
From: tobias at tromm dot no-ip dot org Assigned:
Status: Not a bug Package: *General Issues
PHP Version: 7.2.10 OS: Windows Server 2016
Private report: No CVE-ID: None
 [2018-09-29 20:36 UTC] tobias at tromm dot no-ip dot org

The function rtrim is not working correctly if user uses ASCII table to cheat the system.

Please check the full example I am given with some fix [probably there is a better way to fix it, but, it is working].

Test script:

Expected result:
Result String: "David"

This is the result with my fix script.

Actual result:
Given String: "           David           "

Given String with the use of php trim function: "          David          "


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2018-09-29 20:46 UTC]
-Status: Open +Status: Feedback
 [2018-09-29 20:46 UTC]
If you have a repro script then include it in the report. Don't link us a .zip to download and open.
 [2018-09-29 20:48 UTC] tobias at tromm dot no-ip dot org
-Summary: rtrim not working +Summary: trim not working -Status: Feedback +Status: Open
 [2018-09-29 20:48 UTC] tobias at tromm dot no-ip dot org
fix title
 [2018-09-29 20:52 UTC] tobias at tromm dot no-ip dot org
I am afraid that if I copy the script here the characters will be converted and the result will change.

Thas why I provide you a zip.
 [2018-09-29 20:53 UTC]

   $apelido = "           David           ";

   echo "Given String: \"".$apelido."\"<br><br>";
   echo "Given String with the use of php trim function: \"".trim($apelido)."\"<br><br>";

   echo "Now we will check the ASCII table to check wich character are being used:<br><br>";

   for ($cont=0; $cont <strlen($apelido); $cont++){
      echo "ORD: ".ord($apelido[$cont])."<br>";

   $fazer = 1;

   do {
      $fazer = 0;

      //Check the beginning of the string
      if ($apelido[0] == chr(32) OR $apelido[0] == chr(194) OR $apelido[0] == chr(160)) {
         $fazer = 1;
         $apelido = ltrim($apelido, chr(32));
         $apelido = ltrim($apelido, chr(194));
         $apelido = ltrim($apelido, chr(160));

      //Check the end of the string
      if ($apelido[strlen($apelido)-1] == chr(32) OR $apelido[strlen($apelido)-1] == chr(194) OR $apelido[strlen($apelido)-1] == chr(160)) {
         $fazer = 1;
         $apelido = rtrim($apelido, chr(32));
         $apelido = rtrim($apelido, chr(194));
         $apelido = rtrim($apelido, chr(160));

} while ($fazer == 1);

   echo "<br>Result String: \"".$apelido."\"<br><br>";

 [2018-09-29 20:56 UTC] tobias at tromm dot no-ip dot org

I just copy it and paste on my php editor and the result will change.

Use the zip file from the link I provide.

Otherwise, the space char will be converted!
 [2018-09-29 20:59 UTC]
-Status: Open +Status: Not a bug
 [2018-09-29 20:59 UTC]
Yes, it was converted. But for next time, familiarize yourself with functions like addcslashes so that code CAN be copied and pasted.

$apelido = " \302\240 \302\240 \302\240 \302\240 \302\240 David \302\240 \302\240 \302\240 \302\240 \302\240 ";

The documentation for rtrim explicitly states what characters are trimmed by default. If you don't like that list then provide your own.
 [2018-09-29 21:29 UTC]
You probably need the longer explanation.

Welcome to the world of character encodings. Nearly all of PHP's normal functions work on bytes. You are thinking about characters. Since PHP doesn't internally manage Unicode characters, the only reasonable way PHP can currently convert between the two is look at the byte ranges that all ("all") character encodings can agree upon: the 0-126 range. That means functions like trim will only deal with the bytes \n\r\t\v and space, and \0 for the fun of it, and they will not cover anything above \x7F.

Those bytes blocking trim from reducing the entire string to "David" are above \x7F. The exact interpretation of what characters those bytes are depends on the character encoding. In Latin1, \xA0 (\240) is a non-breaking space, but in UTF-8 it is not a character at all - instead it is part of a 2-4 byte sequence that represents a character (and the sequence for a non-breaking space is \xC2\xA0 or \302\240).

There is no mb_trim function but you can use pcre_replace with \s and the /u option.
 [2018-09-29 21:40 UTC] tobias at tromm dot no-ip dot org
Thank you alot @requinix for the explanation.

Maybe in the future php could have a function to remove all non-breaking space if that's possible.

Thank you again.
 [2018-09-29 23:54 UTC] a at b dot c dot de
For what it's worth, 

preg_replace('/(^[\t\n\r\000\v\pZ]+)|([\t\n\r\000\v\pZ]+$)/u', '', $apelido);

Removes leading/trailing instances of everything that trim() removes and also everything that Unicode considers a "whitespace" character (assumes UTF-8 encoding).
 [2018-10-11 11:41 UTC] tobias at tromm dot no-ip dot org
Unfortunately some user still using a way to insert a database value with space in the end of the string.

If someone have any idea where I will be grateful.

I tryed also preg_replace('/(^[\t\n\r\000\v\pZ]+)|([\t\n\r\000\v\pZ]+$)/u', '', $apelido);

On my tests everything goes ok, but the user is doing something to hack it someway with the following ord sequence in the end:

ORD: 32
ORD: 32
ORD: 32
ORD: 32
ORD: 32
ORD: 32
ORD: 32
ORD: 32
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Jul 13 11:01:24 2020 UTC