php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #54086 Add "default character set" option for mysql connection
Submitted: 2011-02-24 11:37 UTC Modified: 2011-02-24 13:21 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: cavo at ynet dot sk Assigned:
Status: Duplicate Package: MySQLi related
PHP Version: Irrelevant OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: cavo at ynet dot sk
New email:
PHP Version: OS:

 

 [2011-02-24 11:37 UTC] cavo at ynet dot sk
Description:
------------
I'm voting for http://bugs.php.net/bug.php?id=44118
Here is my explanation:

This is actually not implemented feature of mysql protocol (character set handshake), related to both mysql and mysqli PHP extensions causing buggy behavior.

While connecting, several packets exchanges between server and client. After succesful TCP handshake, server sent "Server Greeting" message (packet) where server's default character set is specified (for example "Charset: utf8 COLLATE utf8_general_ci" - 0x21). Then "Login Request" message follows from client's side specifying character set which will be used for connection. Here is problem - PHP's both mysql and mysqli extension send always "Charset: latin1 COLLATE latin1_swedish_ci" - 0x08. But PHP don't look at query responses in way they are encoded in latin1. But mysql database must re-encode data from charset used for table field to latin1.

There is actually 2 different solutions for this problem (mysqli [mysql]):
1. solution:
executing mysqli::set_charset [mysql_set_charset] right after mysqli::__construct [mysql_ connect]. But this causes redundant query sent to server (for example: SET NAMES "utf8");
2. solution:
configure mysql server with option "character-set-client-handshake = false" which makes MySQL behave like MySQL 4.0 and ignore character set sent by client.

I think there should be MySQL/MySQLi Configuration Option like:
mysqli.default_character_set = utf8
mysql.default_character_set = utf8
This will support mysql's character set handshake without executing redundant queries. This option is also possible in mysql cli interface as "--default-character-set=utf8"

Here is Login Request message caught by wireshark for code: <?php mysql_connect("127.0.0.1", "root", "test"); ?>
(character set in "[]" brackets)
0000   3a 00 00 01 85 a2 02 00 00 00 00 40[08]00 00 00  :..........@....
0010   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0020   00 00 00 00 72 6f 6f 74 00 14 0c 79 bf 1a bd d0  ....root...y....
0030   36 0a 21 fc 6d e5 f0 42 27 89 9f 32 e5 95        6.!.m..B'..2..

Here is Login Request message caught by wireshark for code (cli): $ mysql -u root -p -h 127.0.0.1 --default-character-set=utf8
(character set in "[]" brackets)
0000   3a 00 00 01 85 a6 03 00 00 00 00 01[21]00 00 00  :...........!...
0010   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0020   00 00 00 00 72 6f 6f 74 00 14 97 58 9b 0a 65 b8  ....root...X..e.
0030   0c 67 dc 8a 82 28 51 a5 1f 1b 08 28 c5 c2        .g...(Q....(..

Test script:
---------------
<?php mysql_connect("127.0.0.1", "root", "test"); ?>
cli:
mysql -u root -p -h 127.0.0.1 --default-character-set=utf8


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-02-24 13:21 UTC] aharvey@php.net
-Status: Open +Status: Duplicate
 [2011-02-24 13:21 UTC] aharvey@php.net
Johannes's reasoning in request #44118 seems sound to me. Closing.
 [2011-02-24 14:21 UTC] carsten_sttgt at gmx dot de
@cavo
> There is actually 2 different solutions for this problem (mysqli [mysql]):

With mysqli I would use a 3rd:
| <?php
| $link = mysqli_init();
| mysqli_options($link, MYSQLI_SET_CHARSET_NAME, 'latin1');
| mysqli_real_connect($link, null, 'foo', 'bar');
| ?>


@aharvey
> Johannes's reasoning in request #44118 seems sound to me. Closing.

I guess this info is outdated. If mysql(i) is build against mysqlnd, the charset is automatically set to the server default character set. A build against libmysql is using latin1 as default.
Thus people should set the client charset every time in their script (especially if they don't know how PHP is build or the MySQL setup), because you can't disable this automatic.

Regards,
Carsten
 [2011-02-24 15:29 UTC] cavo at ynet dot sk
Aaa, I see. Your response to Request #12213 from 2007-10-09. I don't know about this until now. It's not in any example on web, I recomend to make one. Thank's for that.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 10:01:29 2025 UTC