php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #64090 Normalizer::normalize silently eats data when there is an error
Submitted: 2013-01-28 23:44 UTC Modified: 2021-11-11 11:16 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: T dot J dot Hunt at open dot ac dot uk Assigned:
Status: Open Package: I18N and L10N related
PHP Version: 5.3.21 OS:
Private report: No CVE-ID: None
 [2013-01-28 23:44 UTC] T dot J dot Hunt at open dot ac dot uk
Description:
------------
---
From manual page: http://www.php.net/normalizer.normalize#refsect1-
normalizer.normalize-returnvalues
---

"Return Values: The normalized string or NULL if an error occurred."

There are several issues here:

1. what errors might possibly occur? Please can the documentation explain.

2. if an error occurs, then is there any way to get an error message? I can't see 
one.

3. It is very tempting to write code like: $string = normalizer_normalize($string, 
Normalizer::FORM_C); and imagine that it cannot possibly do any harm. Of course, 
that would be naive. one must save the return value into another variable, and 
then explicitly test for NULL. I consider myself a fairly experienced PHP 
developer, and I managed to screw this up: 
https://moodle.org/mod/forum/discuss.php?d=220722

Anyway, these issues make this API a real pain to work-with. Could we have a 
version of this function that either returns the normalised string, or throws an 
exception?

Test script:
---------------
$string = "Well, I don't know actually. I have no idea what inputs may cause an error. The documentation does not say"; // Change this line as necessary to trigger the problem.

$string = normalizer_normalize($string, Normalizer::FORM_C);

if (!$string) {
    throw new Exception("Ha! Ha! All your data has been lost.);
}
echo $string;

Expected result:
----------------
Input string is output.

Actual result:
--------------
Well, if you can find some suitable problematic input, then the actual result of 
the test script will be an exception. However, I cannot trigger it myself. One 
user of Moodle seems to have triggered it with the input string "July". See 
https://moodle.org/mod/forum/discuss.php?d=220722.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-01-29 12:01 UTC] cataphract@php.net
As far as I can see, the only error than can realistically happen is that the input may be bad UTF-8. In which case, the function returns FALSE, not NULL, as is usual in the intl extension.

There's one code path in the function that indeed returns null (probably an oversight), but it's a path that should never be reached.

As for the exceptions, you can turn them on in intl PECLv3/PHP 5.5 (to use it in PHP 5.3, remove your current intl extension and install the PECL version with pecl install intl-beta):

$ php -d intl.use_exceptions=1 -r 'var_dump(normalizer_normalize("\x80", Normalizer::FORM_C));'
PHP Fatal error:  Uncaught exception 'IntlException' with message 'Error converting input string to UTF-16' in Command line code:1
Stack trace:
#0 Command line code(1): normalizer_normalize('?', 4)
#1 {main}
  thrown in Command line code on line 1
$ php -d intl.use_exceptions=1 -r 'var_dump(normalizer_normalize("", -99));'
PHP Fatal error:  Uncaught exception 'IntlException' with message 'normalizer_normalize: illegal normalization form' in Command line code:1
Stack trace:
#0 Command line code(1): normalizer_normalize('', -99)
#1 {main}
  thrown in Command line code on line 1

For older versions, you can check the errors with intl_get_error_code()/intl_get_error_message(). You can also make the errors throw notices/warnings/whatever:

$ php -d intl.error_level=E_WARNING -r 'var_dump(normalizer_normalize("", -99));'
PHP Warning:  normalizer_normalize(): normalizer_normalize: illegal normalization form in Command line code on line 1
bool(false)
 [2013-01-29 12:08 UTC] T dot J dot Hunt at open dot ac dot uk
Thanks for the quick reply.

So probably all that needs to be done here is to fix the documentation. For example, it was not at all clear to me that the way to get an error message was you use the mechanisms provded by the intl extension. Nor is the return FALSE documented.
 [2018-03-09 11:47 UTC] cmb@php.net
-Package: Unicode Engine related +Package: intl
 [2021-11-11 11:16 UTC] nikic@php.net
-Package: intl +Package: I18N and L10N related
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Nov 03 14:01:29 2024 UTC