php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39244 PHP can't read UTF-8 encoded PHP code
Submitted: 2006-10-24 08:55 UTC Modified: 2007-07-11 22:40 UTC
Votes:4
Avg. Score:5.0 ± 0.0
Reproduced:4 of 4 (100.0%)
Same Version:2 (50.0%)
Same OS:4 (100.0%)
From: j dot hakvoort at publiceren dot net Assigned:
Status: Closed Package: PCRE related
PHP Version: 4.4.4 OS: Linux
Private report: No CVE-ID: None
 [2006-10-24 08:55 UTC] j dot hakvoort at publiceren dot net
Description:
------------
Hi!

I've been working on the encoding issue to make a site compatible for any language, but the problem is that when you print special characters in PHP it will be malformed.

So the advice I received is to encode the PHP code to UTF-8, but when I do this however the script will fail because PHP doesn't read UTF-8 encoded PHP code!

Is this a bug? I am using 4.4.4

Best Regards,
Jan Jaap Hakvoort

Reproduce code:
---------------
<?php
$str = '????? ?';
$match = preg_match('?[ ]+?',$str);
?>



Expected result:
----------------
$match will be set to true.

Actual result:
--------------
PHP error, unexpected character [block]...

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-10-24 09:55 UTC] tony2001@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.


 [2006-10-25 12:08 UTC] j dot hakvoort at publiceren dot net
Ok, I found out that it's due to the folowing BUG in PHP......

UTF-8 encoded documents have 3 characters on top of the document wich specify the UTF-8, this is called BOM.

These characters might be needed, but to get PHP working it would be required to remove these characters.

The only solution to remove these characters I've found is by using special editors. This will take a huge amount of time!

Is there no other solution for this?? Why doesn't PHP read UTF-8 encoded files???

Best Regards,
Jan Jaap Hakvoort
 [2006-10-25 12:11 UTC] tony2001@php.net
PCRE functions in PHP are just wrappers for PCRElib.
If PCRElib is unable to read Unicode texts with BOM, then it's PCRElib fault.
But I guess you shouldn't be using Notepad in the first place.
 [2006-10-25 12:26 UTC] j dot hakvoort at publiceren dot net
Hi!

I don't know PCRElib, but I am not aspecting that it has anything to do with the issue as you mention "functions" reading unicode.

It's not about that, it's about that PHP can't be started when the php script document is encoded in utf-8 format.

This will cause 3 characters to print so that sending headers isn't possible anymore....

Also, PHP doesn't recognize UTF-8 characters in functions, but this is not the main issue I am refering to with the BOM of UTF-8 encoded documents.

Best Regards,
Jan Jaap Hakvoort
 [2006-10-25 12:30 UTC] tony2001@php.net
Unicode support is on its way and will appear in PHP6.
I have to wait until then.
 [2007-07-11 22:40 UTC] j dot hakvoort at publiceren dot net
This bug can be closed.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 22:01:28 2024 UTC