php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #19154 Include "encoding" argument in utf8_encode function
Submitted: 2002-08-28 11:38 UTC Modified: 2002-08-29 10:07 UTC
From: dimitris at ccf dot auth dot gr Assigned:
Status: Closed Package: Feature/Change Request
PHP Version: 4.2.2 OS:
Private report: No CVE-ID: None
 [2002-08-28 11:38 UTC] dimitris at ccf dot auth dot gr
We wanted a way to convert to utf8 strings originating from other than ISO-8859-1 encodings. Looking through the source code in PHP_SOURCE/ext/xml/xml.c we found out that the utf8_encode function has hardwired the ISO-8859-1 encoding method in it, even though the xml_utf8_encode function has already provisions for choosing another encoding (if implemented).

See PHP_FUNCTION(utf8_encode):
encoded = xml_utf8_encode(Z_STRVAL_PP(arg), Z_STRLEN_PP(arg), &len, "ISO-8859-1");
And see: 
static XML_Char *xml_utf8_encode(const char *s, int len, int *newlen, const XML_
Char *encoding)

If PHP is interested in providing UTF-8 conversion support for other lanaguages (encodings), then this extra argument is necessary for the user to be able to indicate originating encoding for the text he is about to convert. As for utf8_decode, that function does not
need an indication of target encoding since the originating text in Unicode already specifies language.

I have patched xml.c for ISO-8859-7 and I would be more
than willing to cooperate with anybody who takes this up, share code and knowledge. I have also posted my "fixes" in the manual page for utf8_encode.

Thanks.
Dimitris Daskopoulos
NOC, Aristotle University of Thessaloniki

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-08-28 12:47 UTC] wez@php.net
Try using mb_convert_encoding in the mbstring extension.
 [2002-08-29 03:44 UTC] dimitris at ccf dot auth dot gr
Indeed, the mbstring extension seems to target multi-byte languages and conversion. But the default PHP functions for handling UTF-8 are tf8_encode/utf8_decode. Forcing those functions to support only ISO-8859-1 is obviously temporary, until mb_string is integrated into the main PHP package,
and all related functions utilize it.

However, this temporary choice restricts many already developed PHP packages that use the utf8 functions (e.g. HORDE/IMP) to just the ISO-8859-1 encoding.
IMHO, this is wrong.
 [2002-08-29 10:07 UTC] wez@php.net
Not at all; the utf8_encode functions work as documented;
anyone using those functions has accepted that
functionality.

People that need more flexibility will be using the
mbstring functions instead.

mbstring is enabled by default as of PHP 4.3, so it does
not make sense to duplicate code and functionality in the
way you have described between the various modules as it
makes things harder to maintain, and potentially more
error prone.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed May 08 11:01:32 2024 UTC