php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52671 PHP gettext not uniformying CRLF newlines
Submitted: 2010-08-22 18:24 UTC Modified: 2021-01-22 22:06 UTC
Votes:11
Avg. Score:4.9 ± 0.3
Reproduced:11 of 11 (100.0%)
Same Version:8 (72.7%)
Same OS:10 (90.9%)
From: adrien dot morel at informance dot info Assigned: cmb (profile)
Status: Closed Package: Gettext related
PHP Version: 5.2.14 OS: Win32
Private report: No CVE-ID: None
 [2010-08-22 18:24 UTC] adrien dot morel at informance dot info
Description:
------------
I already started a dicussion about that on bug-gnu-gettext@gnu.org but it turned out that it may be a PHP gettext implementation issue.

It's a problem I encounter while using the PHP gettext functions on PHP files with CRLF newlines (Windows format). See the test script below and its comment for an explanation.

The problem is that xgettext, from GNU, following the recommandations of the Unicode consortium ( http://www.unicode.org/reports/tr13/tr13-9.html ), changes every CRLF for simple LF when extracting strings from the PHP file. So if a PHP file contains CRLF newlines, xgettext will turn them into LF when writing the catalog. But PHP gettext functions will still look for CRLF newlines in the catalog when finding a string with CRLF newline. The matching msgid won't be found then.

In short, parsing a Windows PHP file with xgettext, and then running PHP gettext on this file will not work, the translation will not be found, because the comparison between strings will fail.

Test script:
---------------
<?php
// If this text is saved in a Unix-style newlines format (LF)
// it will work. In Windows-style (CRLF), it won't, because the
// linebreak in the string will be encoded as CRLF, so it won't
// be found in the catalog, which universally encode newlines as LF.
$s = gettext(
"Hello!
My name is Foo Bar."
);


Expected result:
----------------
Regardless of the newline encoding of the file, the above string should be found in the catalog's msgids, which always use the LF newline.

Actual result:
--------------
For the moment, on Windows-style files, strings with a linebreak inside are not translated even though their translation is available in the catalog.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-01-21 16:10 UTC] cmb@php.net
I tried with xgettext (GNU gettext-tools) 0.19.8.1 and got:

    warning: internationalized messages should not contain the
    '\r' escape sequence

Besides this warning, the \r is stored in the .pot, propagated to the
.po and later expected in the .mo files.

> The problem is that xgettext, from GNU, […], changes every CRLF
> for simple LF when extracting strings from the PHP file.

This is apprently no longer the case, and as such this ticket would
be obsolete, wouldn't it?
 [2021-01-21 16:10 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-01-21 23:19 UTC] adrien dot morel at informance dot info
-Status: Feedback +Status: Assigned
 [2021-01-21 23:19 UTC] adrien dot morel at informance dot info
Indeed, if the library now propagate the CRLF, there is no more bug. Thank you for the update.
 [2021-01-21 23:28 UTC] cmb@php.net
-Status: Assigned +Status: Closed
 [2021-01-21 23:28 UTC] cmb@php.net
Thank you for the swift reply!  And sorry that it took more than
ten years for some action on this ticket.
 [2021-01-22 22:06 UTC] adrien dot morel at informance dot info
No harm, I can remember I found a workaround 9,5 years ago :)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 11:01:29 2024 UTC