php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #80922 Domain names are not case sensitive and should be handled accordingly
Submitted: 2021-03-31 01:23 UTC Modified: 2021-11-11 11:59 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: JunkYardMail1 at Frontier dot com Assigned:
Status: Verified Package: I18N and L10N related
PHP Version: 7.4.16 OS: FreeBSD
Private report: No CVE-ID: None
 [2021-03-31 01:23 UTC] JunkYardMail1 at Frontier dot com
Description:
------------
---
From manual page: https://php.net/function.idn-to-ascii
---
"This function converts a Unicode domain name to an IDNA ASCII-compatible format."

Clearly 'My.Domain.Example.com' is not a "Unicode domain name" and thus it should not be altered.  But it is altered.  It is forced to lower case.

idn_to_ascii() and idn_to_utf8() functions force domain names to lower case.  Even non IDN domain names are forced to lower case.

Domain names are not case sensitive and thus their case should not be altered from what is passed to the functions.  Case should be retained as it is provided.


Test script:
---------------
<?php
echo 'Non IDN:' . "\n";
echo 'idn_to_ascii(\'My.Domain.Example.com\');' . "\n";
echo '  Expected result: My.Domain.Example.com' . "\n";
echo '    Actual result: ' . idn_to_ascii('My.Domain.Example.com') . "\n\n";

echo 'idn_to_utf8(\'My.Domain.Example.com\');' . "\n";
echo '  Expected result: My.Domain.Example.com' . "\n";
echo '    Actual result: ' . idn_to_utf8('My.Domain.Example.com') . "\n\n";

echo 'IDN:' . "\n";
echo 'idn_to_ascii(\'Täst.de\');' . "\n";
echo '  Expected result: xn--Tst-qla.de' . "\n";
echo '    Actual result: ' . idn_to_ascii('Täst.de') . "\n\n";

echo 'idn_to_utf8(\'xn--Tst-qla.de\');' . "\n";
echo '  Expected result: Täst.de' . "\n";
echo '    Actual result: ' . idn_to_utf8('xn--Tst-qla.de') . "\n\n";
?>

Expected result:
----------------
# Non IDN:
My.Domain.Example.com
My.Domain.Example.com

# IDN:
xn--Tst-qla.de
Täst.de


Actual result:
--------------
# Non IDN:
my.domain.example.com
my.domain.example.com

# IDN:
xn--tst-qla.de
täst.de


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-04-06 10:33 UTC] cmb@php.net
-Status: Open +Status: Verified -Type: Feature/Change Request +Type: Documentation Problem
 [2021-04-06 10:33 UTC] cmb@php.net
idn_to_ascii() is a relatively simple wrapper of
uidna_nameToASCII_UTF8(), which is documented[1] as:

| Converts a whole domain name into its ASCII form for DNS lookup.

So the PHP documentation isn't quite right.  The fact that the
function returns the lower cased domain name, is just part of the
normalization.

I don't think it makes sense to change that behavior; if you don't
want the lower casing for ASCII domain names, just don't pass
these to idn_to_ascii().

[1] <https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uidna_8h.html#aac0ee59298161bde6220f59927249f21>
 [2021-04-06 19:31 UTC] JunkYardMail1 at Frontier dot com
The issue is broader than just not passing ASCII domain names.

It also affects unicode domain names.  These need to be converted to ASCII IDN for DNS usage.  But should retain their original case as that is what the human input intended, as these are sometimes stored and redisplayed for human consumption as the ASCII IDN or as the original unicode, and should be presented case unchanged.
 [2021-04-06 20:17 UTC] JunkYardMail1 at Frontier dot com
<?php
// This works to prevent forcing lowercase of ASCII domain names.
// But not does not work to prevent forcing lowercase of Unicode domain names.

if (($idn = idn_to_ascii($domain_name)) === strtolower($domain_name)) {
	// Use the original $domain_name
} else {
	// Use the converted $idn
}
?>
 [2021-11-11 11:59 UTC] nikic@php.net
-Package: idn +Package: I18N and L10N related
 [2023-01-23 09:54 UTC] jules dot bernable at gmail dot com
RFC3986 section 3.2.2[1] clearly states that the host component of a URI is case-insensitive.

The idn_to_ascii function uses ICU's uidna_nameToASCII, whose purpose is to encode domain names in the format defined in RFC5890[2], to allow robust case-insensitive comparison of IDNs.

Therefore, I think this bug is invalid.

[1] www.rfc-editor.org/rfc/rfc3986#section-3.2.2
[2] www.rfc-editor.org/rfc/rfc5890
 [2023-06-30 10:09 UTC] asfrdgtfgjh322 at gmail dot com
(https://php.net)(https://www.marykayintouch.website/)

That was so amazing.
 [2023-06-30 10:13 UTC] julia788maxwell at gmail dot com
Great news. (https://php.net/)(https://www.yourtexasbenefits.bid/)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Nov 11 17:01:29 2024 UTC