php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74955 Connection with UTF-8 return wrong String size
Submitted: 2017-07-20 07:56 UTC Modified: 2018-07-06 18:48 UTC
Votes:4
Avg. Score:4.5 ± 0.5
Reproduced:3 of 3 (100.0%)
Same Version:3 (100.0%)
Same OS:2 (66.7%)
From: valentn at microtec dot fr Assigned: ab (profile)
Status: Closed Package: PDO Firebird
PHP Version: 7.1.7 OS: All
Private report: No CVE-ID: None
 [2017-07-20 07:56 UTC] valentn at microtec dot fr
Description:
------------
When I select char field in my database, it's returned with a size equivalent of storage bytes size.
For utf8 char(3) return String length is 12. (3 char * 4bytes = 12)



Test script:
---------------
SQL :
CREATE TABLE COMMERCIAUX (
  ACTIF CHAR(3) CHARACTER SET 'UTF8'
);
INSERT INTO COMMERCIAUX(ACTIF) VALUES ('Oui');

<?php

$connection = new PDO('firebird:dbname=localhost/3050:MIASERVER;charset=utf8;encoding=UTF8', 'SYSDBA', 'masterkey');
$stmt = $connection->query('SELECT actif FROM commerciaux');
$result = $stmt->fetchAll();

?>


Expected result:
----------------
     [
       "ACTIF" => "Oui",
       0 => "Oui",
     ],
     [
       "ACTIF" => "Oui",
       0 => "Oui",
     ],
     [
       "ACTIF" => "Non",
       0 => "Non",
     ],

Actual result:
--------------
You can see they spaces after text
     [
       "ACTIF" => "Oui         ",
       0 => "Oui         ",
     ],
     [
       "ACTIF" => "Oui         ",
       0 => "Oui         ",
     ],
     [
       "ACTIF" => "Non         ",
       0 => "Non         ",
     ],

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-07-06 18:48 UTC] ab@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2018-07-06 18:48 UTC] ab@php.net
Thanks for the report. The documentation tells, that char has a fixed length. See

https://firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-datatypes-chartypes.html

For UTF-8, the database reserves 4 bytes per char. A further quote from the doc is

"If the entered number of characters is less than the declared length, trailing spaces will be added to the field"

By inserting an ASCII char, 3 last bytes would be therefore spaces. A debug session shows, that it's the case. In your snippet, char[3] would effectively reserve 12 bytes in the database. When you insert "Oui" - that's only 3 bytes. The database delivers the length of 12, however.

Even if it is a bit unexpected, that is the actual database implementation. I remember seeing this with some other DBs, too. It would be inconvenient to cut the trailing spaces, as that might be exactly an intention in some case. On opposite to that, VARCHAR delivered the result you'd expect.

As long as the database delivers the reserved length in bytes, not in actual chars used, IMO it is considered not a bug.

Thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 20:01:28 2024 UTC