php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30477 UTF-8 characters are changed by mysql_query
Submitted: 2004-10-19 11:36 UTC Modified: 2004-10-19 13:50 UTC
From: stian at grytoyr dot net Assigned:
Status: Not a bug Package: MySQL related
PHP Version: 5.0.2 OS: SuSE Linux 9.1 Professional
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: stian at grytoyr dot net
New email:
PHP Version: OS:

 

 [2004-10-19 11:36 UTC] stian at grytoyr dot net
Description:
------------
I'm using PHP 5.0.2, MySQL 4.1.6gamma and Apache 2.0.49-27.16 on SuSE Linux Professional 9.1. PHP has been compiled by me, MySQL is a binary distribution from MySQL, Apache is the SuSE default.

When I submit a query containing a UTF-8 representation (hex
value 0xC48D) of the Unicode character &#x010D (lowercase c with caron) to the MySQL server using mysql_query, sniffing the network traffic reveals that the character is sent as 0xC43F, which is not a valid UTF-8 character. 

It's important to note that this happens BEFORE the data reaches the MySQL server, so I believe it must be a PHP problem.

Some other Unicode characters are also problematic, including &#x0110 and &#x00C1, but they are both sent as 0xC33F.

The MySQL table has been created with CHARACTER SET utf8.


I don't think this bug is possible to reproduce without running mysql_query, so the code below requires a connection to a MySQL database.

This _may_ be a duplicate of Bug #28231, but I'm not entirely sure.

Reproduce code:
---------------
header("Content-Type: text/html; charset=UTF-8");
// connect to a mysql database which supports UTF-8 here
$testStr = chr(196) . chr(141); // Unicode character &#x010D (c caron)
echo '$testStr before query: ' . $testStr . '<br />'; // will output the correct UTF-8 character (c caron)
echo 'hex of $testStr before query: ' . bin2hex($testStr) . '<br />'; // will output c48d

$updateResult = mysql_query("update klara_test set klara_text = '$testStr' where klara_id = 1", $conn);
$fetchResult = mysql_query("select klara_text from klara_test", $conn);

while ($myrow = mysql_fetch_array($fetchResult)) {
  echo '$testStr from database: ' . $myrow['klara_text'] . '<br />'; // will output gibberish
  echo 'hex of $testStr from database: ' . bin2hex($myrow['klara_text']) . '<br />'; // will output c43f
}


Expected result:
----------------
I would expect the UTF-8 characters to remain unchanged.

Actual result:
--------------
Certain Unicode characters are altered by mysql_query, as far as I can tell.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-10-19 13:00 UTC] stian at grytoyr dot net
Oh, and an example of the output I'm seeing with the code in the "reproduce" section can be viewed here:

http://klara.grytoyr.net/tmp/bug.php
 [2004-10-19 13:40 UTC] georg@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Make sure that your client character set is also set to 
UTF8! (SET NAMES UTF8) 
 [2004-10-19 13:50 UTC] stian at grytoyr dot net
My bad. Thanks for pointing that out, it works now. Maybe it would be helpful to add a short note about that in the PHP docs for MySQL? It could be just me, of course, but I've spent days on this :)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 16:01:31 2025 UTC