PHP :: Doc Bug #74528 :: mb_check_encoding Documentation Confusing

Doc Bug #74528	mb_check_encoding Documentation Confusing
Submitted:	2017-05-02 10:08 UTC	Modified:	2017-05-02 15:39 UTC
From:	james dot antrim at nm dot thm dot de	Assigned:	cmb (profile)
Status:	Not a bug	Package:	mbstring related
PHP Version:	5.6.30	OS:	Debian 3.16.39-1+deb8u2 (2017-03
Private report:	No	CVE-ID:	None

View Developer Edit

Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !

Your email address: MUST BE VALID
Solve the problem: 37 - 10 = ?
Subscribe to this entry?

[2017-05-02 10:08 UTC] james dot antrim at nm dot thm dot de

Description:
------------
---
From manual page: http://www.php.net/function.mb-detect-encoding
---

I saved a file that was originally ISO-8859-1 to several different coding standards, among them UTF-8 and Japanese Shift JIS. When using this function with 'UTF-8' as the 'encoding list' parameter I always get 'UTF-8' as the return value.

I assume this is working as intended, because it really solves both the problem of detection and conversion in one function. I find the return documentation more than confusing, because, despite the function being called 'detect' it does not tell me the original string's encoding, but the encoding of the converted string if conversion was  possible. My suggestion would be:
"The character encoding or FALSE if the encoding cannot be converted to any of the given strings."


Test script:
---------------
Convert any ISO-8859-1 file to any other encoding, then run mb_detect_encoding($file, 'UTF-8', true) on it.

This same documentation problem is evident for mb_check_encoding($file, 'UTF-8') which will always return true under the same conditions.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2017-05-02 14:33 UTC] cmb@php.net

-Status: Open +Status: Not a bug -Package: Documentation problem +Package: mbstring related -Assigned To: +Assigned To: cmb

[2017-05-02 14:33 UTC] cmb@php.net

mb_detect_encoding() is not supposed to do any conversion, so the
documentation is correct in this regard. However, if you want to
check for 'UTF-8' (and maybe other encodings as well), you have
to use strict detection, see <https://3v4l.org/DFIAs> vs.
<https://3v4l.org/M23D4> and also
<http://php.net/manual/en/function.mb-detect-encoding.php#102510>.

[2017-05-02 15:39 UTC] requinix@php.net

> it does not tell me the original string's encoding, but the encoding of the
> converted string if conversion was  possible. 
There's no way to know for sure what the original encoding of a string or file was. They simply don't store that kind of information. All PHP can do is look through a set of encodings and find the first one that supports the byte sequence. It also doesn't help that most encodings out there are identical in the standard ASCII range (bytes 0x20-0x7E).
You, as the developer, having more knowledge about the input string than PHP, have to construct that set of encodings carefully while taking into account the nature of the byte sequences they create and how they can overlap with each other.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Sun Jul 06 03:01:35 2025 UTC