php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #38766 Nonbreaking whitespace breaks parsing
Submitted: 2006-09-10 02:19 UTC Modified: 2007-01-09 04:26 UTC
Votes:8
Avg. Score:4.2 ± 1.0
Reproduced:7 of 7 (100.0%)
Same Version:3 (42.9%)
Same OS:2 (28.6%)
From: a at b dot c dot de Assigned:
Status: Wont fix Package: Feature/Change Request
PHP Version: 5.1.6 OS: Windows XP SP2
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: a at b dot c dot de
New email:
PHP Version: OS:

 

 [2006-09-10 02:19 UTC] a at b dot c dot de
Description:
------------
In most (Windows) fonts, character 0xa0 renders as a blank (nonbreaking) space. Some people use text editors that for whatever reason like to use 0xa0 at random locations instead of ordinary spaces. The result is code that _looks_ correct, but when run can trigger a parse error for no apparent reason.

In zend_language_scanner.l, 0xa0 is recognised as a LABEL character (which means, among other things, that you could have a variable called "$ ");. Assuming no-one is perverted enough to do something like that, it should be safe to reassign it to WHITESPACE (will this cause grief in other character sets?).


Reproduce code:
---------------
<?php
//There is an 0xa0 character immediately following the
// brace on the next line
for($i=0; $i<5; $i++){ 
echo  "Hello, World\n";
}
?>


Expected result:
----------------
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World


Actual result:
--------------
Parse error: parse error, unexpected T_ECHO in C:\test.php on line 5

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-09-10 09:38 UTC] derick@php.net
Can't do that as people using UTF8 in their scrips might use this character as part of an identifier.
 [2007-01-09 04:26 UTC] a at b dot c dot de
I guessed as much. Personally I blame stupid text editors for inserting the character on the ends of lines (which is where I've found them).
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Jun 21 14:01:33 2025 UTC