php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #64260 Documentation misleading
Submitted: 2013-02-21 00:48 UTC Modified: 2013-02-27 10:33 UTC
From: adam at acdinternet dot com Assigned:
Status: Wont fix Package: Documentation problem
PHP Version: 5.3.21 OS: Ubuntu 10.4
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2013-02-21 00:48 UTC] adam at acdinternet dot com
Description:
------------
---
From manual page: http://www.php.net/regexp.reference.unicode
---

This page states "Those that are not part of an identified script are lumped together as Common."

That means mutual exclusivity between the list below in the docs and 'Common'. In practise this appears not to be true.

The character 'm' is in both 'Common' and 'Ogham' according to http://www.pagecolumn.com/tool/pregtest.htm

Evidence..

Try preg pattern: 

/[\p{Ogham}]/iu

with Testing subject

moo

Then try 

/[\p{Common}]/iu

..you get results (matches) both times. The documentation is therefore logically incorrect?

The character 'o' seems also to be in \p{Tifinagh}

This has caused me real problems, so bears correcting if I'm right.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-02-27 04:37 UTC] frozenfire@php.net
This seems more like a data source bug than a documentation bug. The text "Those 
that are not part of an identified script are lumped together as Common." is 
lifted directly from the pcre manpage (http://pcre.org/pcre.txt). I think that 
the documentation is intended to be correct in theory, but upstream may have a 
bug that causes it not to be correct.

I would recommend filing a bug upstream, if the issue is one that is likely to 
cause others problems in the future.
 [2013-02-27 04:37 UTC] frozenfire@php.net
-Status: Open +Status: Wont fix
 [2013-02-27 10:33 UTC] adam at acdinternet dot com
Yes OK.

Further research made me doubt my original assertions. It's never clear from the web regex testers whether they have the PCRE modules installed/enabled, so I need to check it all on my own system.

I still think there's some basic ambiguity, which would be helped by a list somewhere of exactly which Unicode character ranges fall within [Common] - and the others. Otherwise it's all down to trial and error?

But thanks very much for looking into this.
 [2013-10-07 01:31 UTC] a at b dot c dot de
Certainly "m" shouldn't count as an Ogham rune!

The definitive list is maintained by Unicode (www.unicode.org) as the "Scripts" UCD document, extracted from the database's XML files (the Common characters are those with a script identifier of Zyyy).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Apr 20 01:01:28 2024 UTC