php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72547 \p{Ll} fails to match accented lowercase characters.
Submitted: 2016-07-05 13:14 UTC Modified: 2016-07-05 15:57 UTC
From: sprunka at gmail dot com Assigned: cmb (profile)
Status: Not a bug Package: PCRE related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: sprunka at gmail dot com
New email:
PHP Version: OS:

 

 [2016-07-05 13:14 UTC] sprunka at gmail dot com
Description:
------------
While testing code for matching password complexity, I noticed the \p{Lu} matches all uppercase characters correctly, but \p{Ll} fails to match accented characters. The German "double s" character is debatable whether it should pass or fail. It is definitively a lowercase character, but it has no uppercase variant. Therefore, by definition or \p{Ll} it should not match, but all other cases I can find for it in PCRE examples do have it match \p{Ll}. I have included it as passing as a lower case in the "expected" results below.

Test script:
---------------
<?php
// SHOULD PASS. Passes all 5.0.5 and thereafter.
$passwordTestArray[] = 'Aa';
// SHOULD PASS. Passes all 5.0.5 and thereafter.
$passwordTestArray[] = 'Æa';
// SHOULD PASS. Fails after 5.2.10. Passes 5.0.5 - 5.2.9
$passwordTestArray[] = 'Aæ';
// SHOULD PASS. Fails after 5.2.10. Passes 5.0.5 - 5.2.9
$passwordTestArray[] = 'Ææ';
// SHOULD FAIL because ß does not have an uppercase equiv. But it passses anyway in 5.0.5 - 5.2.9
$passwordTestArray[] = 'Aß';


// SHOULD FAIL because there is no uppercase ß.
// It fails for the wrong reasons in 5.0.5 - 5.2.9
// It passes in 5.2.10 and beyond.
$passwordTestArray[] = 'aß';


class PCREFail
{
    static public function verifyComplexity($plainTextPassword)
    {
        $re = "/(?=.*[\\p{Ll}])(?=.*[\\p{Lu}]).{2,}/";
        $found = preg_match_all($re, $plainTextPassword, $matches);
        if ($found) {
            return $matches;
        }
        return $found;
    }

}

foreach ($passwordTestArray as $passwordToTest) {
    var_dump(PCREFail::verifyComplexity($passwordToTest));
}

Expected result:
----------------
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(2) "Aa"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Æa"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Aæ"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(4) "Ææ"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Aß"
  }
}
int(0)

Actual result:
--------------
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(2) "Aa"
  }
}
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "Æa"
  }
}
int(0)
int(0)
int(0)
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(3) "aß"
  }
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-07-05 13:16 UTC] sprunka at gmail dot com
Here is the example code running in 3v4l.org:
https://3v4l.org/hCcqj
 [2016-07-05 15:10 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2016-07-05 15:10 UTC] cmb@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

When working with UTF-8, you should use the u modifier, see
<https://3v4l.org/iOJl3>.
 [2016-07-05 15:21 UTC] sprunka at gmail dot com
That is fantastic. Thank you. However, if it's not a bug, why does \p{Lu} seem to work without the u modifier, but \p{Ll} does not? Surely one or the other must be a bug, given the inconsistency.
 [2016-07-05 15:57 UTC] cmb@php.net
> However, if it's not a bug, why does \p{Lu} seem to work without
> the u modifier, but \p{Ll} does not?

It isn't supposed to (otherwise file a new ticket), e.g.

  preg_match('/^\\p{Lu}$/', "\xc3\x9c") // => 0

Without the $ anchor it would (depending on the locale), because
\x9c is à in ISO-8859-1.

For a list of more appropriate places to ask for help using PHP,
please visit http://www.php.net/support.php as this bug system is
not the appropriate forum for asking support questions. :)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 11:01:31 2025 UTC