php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #72506 idn_to_ascii for UTS #46 incorrect for long domain names
Submitted: 2016-06-27 14:57 UTC Modified: 2021-11-11 11:17 UTC
Votes:5
Avg. Score:3.4 ± 1.0
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:1 (33.3%)
From: kulikovdn at gmail dot com Assigned:
Status: Re-Opened Package: I18N and L10N related
PHP Version: 7.1.0alpha1 OS: Linux Ubuntu 14.04
Private report: No CVE-ID: None
 [2016-06-27 14:57 UTC] kulikovdn at gmail dot com
Description:
------------
Function "idn_to_ascii" from intl extension works incorrect for very long domain names if used UTS #46 algorithm.
Appearance of bug depends on string length. For ASCII strings with length < 255 characters function works correct, 255 characters cause fatal error, > 255 characters lead to incorrect result, but without fatal error.

If you think that this bug can be easily workarounded by testing of string length - it is not true, domain name may contain non-ASCII characters, in this case bug will happen for smaller string lengths and it is not possible to predict for which length it will happen.

Looks like bug affects all PHP versions.

Execute test scripts:
https://3v4l.org/tuM9l
https://3v4l.org/eX6aA
https://3v4l.org/EUSlJ

Related pages in docs:
http://php.net/manual/en/function.idn-to-ascii.php
http://php.net/manual/en/intl.constants.php

Test script:
---------------
// First test.
// 255 characters, max allowed for FQDN is 253, note that trailing dot ignored
$value = str_repeat('a.', 126) . 'aaa';
// Fatal error
$result = idn_to_ascii($value, 0, INTL_IDNA_VARIANT_UTS46, $idnaInfo);

// Second test.
// 256 characters, one character added
$value = str_repeat('a.', 126) . 'aaaa';
$result = idn_to_ascii($value, 0, INTL_IDNA_VARIANT_UTS46, $idnaInfo);
var_dump($result);
// $idnaInfo is empty array, it is incorrect
// $idnaInfo['errors'] should be equal to 4, it is value of IDNA_ERROR_DOMAIN_NAME_TOO_LONG
var_dump($idnaInfo);

// Third test.
// "ф" is cyrillic symbol, only 38 characters, but after conversion to ASCII will be much more
$value = str_repeat('ф.', 31) . 's.s.s.s';
// Fatal error
$result = idn_to_ascii($value, 0, INTL_IDNA_VARIANT_UTS46, $idnaInfo);


Expected result:
----------------
Fatal error should not happen.
Variable $idnaInfo should contain array like this:
[
  'result' => 'VERY LONG STRING ...',
  'isTransitionalDifferent' => false,
  'errors' => 4,
]

See correct result for 254 characters:
https://3v4l.org/CijFV

Actual result:
--------------
Fatal error happens.
$idnaInfo contains empty array.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-06-30 06:06 UTC] kulikovdn at gmail dot com
Text of fatal error:
"Fatal error: idn_to_ascii(): ICU returned an unexpected length"
 [2016-07-11 23:18 UTC] cmb@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: cmb
 [2016-07-11 23:18 UTC] cmb@php.net
Indeed, idn_to_ascii() appears to be overly restrictive. If the
resulting domain is 255 bytes long, it raises an error[1], if it
is longer, it returns FALSE and doesn't fill $idna_info[2]. This
inconsistency is definitely a bug.

Not filling $idna_info when the resulting domain is longer than
254 bytes is also a bug according to the documentation[3] which
states:

| In that case, it will be filled with an array with the keys
| 'result', the possibly illegal result of the transformation, […]

The issue here is that we have to allocate a buffer in advance
(i.e. before the length of the resulting domain is known)[4]. Of
course, it would be possible to allocate a larger buffer, but that
might still not be big enough, and appears to be useless anyway,
because the domain name would be invalid. Therefore I suggest to
set $idna_info['result'] to NULL for excessive resulting domains,
and to only do this as of PHP 7.1, due to the (minor) BC break.
For current versions, the documentation should be fixed.

[1] <https://github.com/php/php-src/blob/php-5.6.23/ext/intl/idn/idn.c#L167-L169>
[2] <https://github.com/php/php-src/blob/php-5.6.23/ext/intl/idn/idn.c#L161-L166>
[3] <http://php.net/manual/en/function.idn-to-ascii.php>
[4] <https://github.com/php/php-src/blob/php-5.6.23/ext/intl/idn/idn.c#L142-L143>
 [2016-07-12 12:28 UTC] cmb@php.net
-Summary: idn_to_ascii with INTL_IDNA_VARIANT_UTS46 works incorrect for long domain names +Summary: idn_to_ascii for UTS #46 incorrect for long domain names
 [2016-07-12 12:57 UTC] cmb@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=php-src.git;a=commit;h=76e249d31c51d0b4f8f11507c550ca1eec1dd38a
Log: Partially fix #72506: idn_to_ascii for UTS #46 incorrect for long domain names
 [2016-07-12 12:57 UTC] cmb@php.net
-Status: Verified +Status: Closed
 [2016-07-12 12:58 UTC] cmb@php.net
-Status: Closed +Status: Re-Opened -Assigned To: cmb +Assigned To:
 [2016-10-17 10:11 UTC] bwoebi@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=php-src.git;a=commit;h=76e249d31c51d0b4f8f11507c550ca1eec1dd38a
Log: Partially fix #72506: idn_to_ascii for UTS #46 incorrect for long domain names
 [2016-10-17 10:11 UTC] bwoebi@php.net
-Status: Re-Opened +Status: Closed
 [2016-11-22 13:00 UTC] cmb@php.net
-Status: Closed +Status: Re-Opened
 [2016-11-22 13:00 UTC] cmb@php.net
This has been inadvertantly closed – reopening.
 [2016-11-23 13:22 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2017-06-21 08:36 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2021-11-11 11:17 UTC] nikic@php.net
-Package: intl +Package: I18N and L10N related
 [2023-01-24 12:05 UTC] jules dot bernable at gmail dot com
The 255 octets limit is mentioned in several specifications:

RFC 1034 section 3.1 - Name space specifications and terminology:

> To simplify implementations, the total number of octets that represent
> a domain name (i.e., the sum of all label octets and label lengths) is
> limited to 255.


RFC 1036 section 2.3.4 - Size limits:
> Various objects and parameters in the DNS have size limits.
> They are listed below.
> Some could be easily changed, others are more fundamental.

> labels          63 octets or less
> names           255 octets or less
> TTL             positive values of a signed 32 bit number.
> UDP messages    512 octets or less


RFC 2181 section 11 - Name syntax:

> The DNS itself places only one restriction on the particular labels
> that can be used to identify resource records.
> That one restriction relates to the length of the label and the full name.
> The length of any one label is limited to between 1 and 63 octets.
> A full domain name is limited to 255 octets (including the separators)


So it seems that the behaviour of the IDNA functions is correct WRT the aforementioned specifications.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 09:01:32 2024 UTC