|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39244 PHP can't read UTF-8 encoded PHP code
Submitted: 2006-10-24 08:55 UTC Modified: 2007-07-11 22:40 UTC
Avg. Score:5.0 ± 0.0
Reproduced:4 of 4 (100.0%)
Same Version:2 (50.0%)
Same OS:4 (100.0%)
From: j dot hakvoort at publiceren dot net Assigned:
Status: Closed Package: PCRE related
PHP Version: 4.4.4 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: j dot hakvoort at publiceren dot net
New email:
PHP Version: OS:


 [2006-10-24 08:55 UTC] j dot hakvoort at publiceren dot net

I've been working on the encoding issue to make a site compatible for any language, but the problem is that when you print special characters in PHP it will be malformed.

So the advice I received is to encode the PHP code to UTF-8, but when I do this however the script will fail because PHP doesn't read UTF-8 encoded PHP code!

Is this a bug? I am using 4.4.4

Best Regards,
Jan Jaap Hakvoort

Reproduce code:
$str = '????? ?';
$match = preg_match('?[ ]+?',$str);

Expected result:
$match will be set to true.

Actual result:
PHP error, unexpected character [block]...


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-10-24 09:55 UTC]
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

 [2006-10-25 12:08 UTC] j dot hakvoort at publiceren dot net
Ok, I found out that it's due to the folowing BUG in PHP......

UTF-8 encoded documents have 3 characters on top of the document wich specify the UTF-8, this is called BOM.

These characters might be needed, but to get PHP working it would be required to remove these characters.

The only solution to remove these characters I've found is by using special editors. This will take a huge amount of time!

Is there no other solution for this?? Why doesn't PHP read UTF-8 encoded files???

Best Regards,
Jan Jaap Hakvoort
 [2006-10-25 12:11 UTC]
PCRE functions in PHP are just wrappers for PCRElib.
If PCRElib is unable to read Unicode texts with BOM, then it's PCRElib fault.
But I guess you shouldn't be using Notepad in the first place.
 [2006-10-25 12:26 UTC] j dot hakvoort at publiceren dot net

I don't know PCRElib, but I am not aspecting that it has anything to do with the issue as you mention "functions" reading unicode.

It's not about that, it's about that PHP can't be started when the php script document is encoded in utf-8 format.

This will cause 3 characters to print so that sending headers isn't possible anymore....

Also, PHP doesn't recognize UTF-8 characters in functions, but this is not the main issue I am refering to with the BOM of UTF-8 encoded documents.

Best Regards,
Jan Jaap Hakvoort
 [2006-10-25 12:30 UTC]
Unicode support is on its way and will appear in PHP6.
I have to wait until then.
 [2007-07-11 22:40 UTC] j dot hakvoort at publiceren dot net
This bug can be closed.
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Sat Feb 04 00:03:40 2023 UTC