PHP :: Doc Bug #64260 :: Documentation misleading

Doc Bug #64260	Documentation misleading
Submitted:	2013-02-21 00:48 UTC	Modified:	2013-02-27 10:33 UTC
From:	adam at acdinternet dot com	Assigned:
Status:	Wont fix	Package:	Documentation problem
PHP Version:	5.3.21	OS:	Ubuntu 10.4
Private report:	No	CVE-ID:	None

View Developer Edit

[2013-02-21 00:48 UTC] adam at acdinternet dot com

Description:
------------
---
From manual page: http://www.php.net/regexp.reference.unicode
---

This page states "Those that are not part of an identified script are lumped together as Common."

That means mutual exclusivity between the list below in the docs and 'Common'. In practise this appears not to be true.

The character 'm' is in both 'Common' and 'Ogham' according to http://www.pagecolumn.com/tool/pregtest.htm

Evidence..

Try preg pattern: 

/[\p{Ogham}]/iu

with Testing subject

moo

Then try 

/[\p{Common}]/iu

..you get results (matches) both times. The documentation is therefore logically incorrect?

The character 'o' seems also to be in \p{Tifinagh}

This has caused me real problems, so bears correcting if I'm right.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2013-02-27 04:37 UTC] frozenfire@php.net

This seems more like a data source bug than a documentation bug. The text "Those 
that are not part of an identified script are lumped together as Common." is 
lifted directly from the pcre manpage (http://pcre.org/pcre.txt). I think that 
the documentation is intended to be correct in theory, but upstream may have a 
bug that causes it not to be correct.

I would recommend filing a bug upstream, if the issue is one that is likely to 
cause others problems in the future.

[2013-02-27 04:37 UTC] frozenfire@php.net

-Status: Open +Status: Wont fix

[2013-02-27 10:33 UTC] adam at acdinternet dot com

Yes OK.

Further research made me doubt my original assertions. It's never clear from the web regex testers whether they have the PCRE modules installed/enabled, so I need to check it all on my own system.

I still think there's some basic ambiguity, which would be helped by a list somewhere of exactly which Unicode character ranges fall within [Common] - and the others. Otherwise it's all down to trial and error?

But thanks very much for looking into this.

[2013-10-07 01:31 UTC] a at b dot c dot de

Certainly "m" shouldn't count as an Ogham rune!

The definitive list is maintained by Unicode (www.unicode.org) as the "Scripts" UCD document, extracted from the database's XML files (the Common characters are those with a script identifier of Zyyy).

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Wed Jul 16 22:01:34 2025 UTC