php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72547 \p{Ll} fails to match accented lowercase characters.
Submitted: 2016-07-05 13:14 UTC Modified: 2016-07-05 15:57 UTC
From: sprunka at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: PCRE related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
28 - 17 = ?
Subscribe to this entry?

 
 [2016-07-05 13:14 UTC] sprunka at gmail dot com
Description:
------------
While testing code for matching password complexity, I noticed the \p{Lu} matches all uppercase characters correctly, but \p{Ll} fails to match accented characters. The German "double s" character is debatable whether it should pass or fail. It is definitively a lowercase character, but it has no uppercase variant. Therefore, by definition or \p{Ll} it should not match, but all other cases I can find for it in PCRE examples do have it match \p{Ll}. I have included it as passing as a lower case in the "expected" results below.

Test script:
---------------
<?php
// SHOULD PASS. Passes all 5.0.5 and thereafter.
$passwordTestArray[] = 'Aa';
// SHOULD PASS. Passes all 5.0.5 and thereafter.
$passwordTestArray[] = 'Æa';
// SHOULD PASS. Fails after 5.2.10. Passes 5.0.5 - 5.2.9
$passwordTestArray[] = 'Aæ';
// SHOULD PASS. Fails after 5.2.10. Passes 5.0.5 - 5.2.9
$passwordTestArray[] = 'Ææ';
// SHOULD FAIL because ß does not have an uppercase equiv. But it passses anyway in 5.0.5 - 5.2.9
$passwordTestArray[] = 'Aß';


// SHOULD FAIL because there is no uppercase ß.
// It fails for the wrong reasons in 5.0.5 - 5.2.9
// It passes in 5.2.10 and beyond.
$passwordTestArray[] = 'aß';


class PCREFail
{
    static public function verifyComplexity($plainTextPassword)
    {
        $re = "/(?=.*[\\p{Ll}])(?=.*[\\p{Lu}]).{2,}/";
        $found = preg_match_all($re, $plainTextPassword, $matches);
        if ($found) {
            return $matches;
        }
        return $found;
    }

}

foreach ($passwordTestArray as $passwordToTest) {
    var_dump(PCREFail::verifyComplexity($passwordToTest));
}

Expected result:
----------------
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(2) "Aa"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Æa"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Aæ"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(4) "Ææ"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Aß"
  }
}
int(0)

Actual result:
--------------
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(2) "Aa"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Æa"
  }
}
int(0)
int(0)
int(0)
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "aß"
  }
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-05 13:16 UTC] sprunka at gmail dot com
Here is the example code running in 3v4l.org:
https://3v4l.org/hCcqj
 [2016-07-05 15:10 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-05 15:10 UTC] cmb@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

When working with UTF-8, you should use the u modifier, see
<https://3v4l.org/iOJl3>.
 [2016-07-05 15:21 UTC] sprunka at gmail dot com
That is fantastic. Thank you. However, if it's not a bug, why does \p{Lu} seem to work without the u modifier, but \p{Ll} does not? Surely one or the other must be a bug, given the inconsistency.
 [2016-07-05 15:57 UTC] cmb@php.net
> However, if it's not a bug, why does \p{Lu} seem to work without
> the u modifier, but \p{Ll} does not?

It isn't supposed to (otherwise file a new ticket), e.g.

  preg_match('/^\\p{Lu}$/', "\xc3\x9c") // => 0

Without the $ anchor it would (depending on the locale), because
\x9c is à in ISO-8859-1.

For a list of more appropriate places to ask for help using PHP,
please visit http://www.php.net/support.php as this bug system is
not the appropriate forum for asking support questions. :)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Apr 27 01:01:30 2024 UTC