php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80869 SQL Server: Read NVARCHAR Unicode
Submitted: 2021-03-15 16:25 UTC Modified: 2021-03-18 13:30 UTC
From: sergiopaternoster73 at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PDO ODBC
PHP Version: 8.0.3 OS: Ubuntu 18.04 Server
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: sergiopaternoster73 at gmail dot com
New email:
PHP Version: OS:

 

 [2021-03-15 16:25 UTC] sergiopaternoster73 at gmail dot com
Description:
------------
I have an SQL Server 2012 table that contains all NVARCHAR columns.
With PHP 7.3/7.4 I can read both ASCII and UNICODE characters from the table by running this query: SELECT CAST([NAME] AS NVARCHAR(MAX)) AS [Name] FROM [Table].
With php-8.0.3 I can still read ASCII results (plain English, for example) but not UNICODE anymore (Chinese characters, in my case).

I use php-8.0.3 with FreeTDS on Linux Ubuntu. In PHP 7.3/7.4 works well!




Test script:
---------------
```
$query = "SELECT
          CAST([NAME] AS NVARCHAR(MAX)) AS [Name]
          FROM [Table]";
$pdo      = new PDO("odbc:MyConnection","MyUser","MyPassword");
$rs       = $pdo->query($query);
$result   = $rs->fetchColumn();
file_put_contents("result.txt",$result);
$pdo = null;
```

/etc/odbc.ini settings

[MyConnection]
Description       = "Data Warehouse"
Port              = 1433
Driver            = FreeTDS
Description       = SQL Server
Trace             = No
Server            = my.server.com
Database          = AB
TDS_Version       = 8.0
Client_Charset    = UTF-8
ApplicationIntent = ReadOnly

Expected result:
----------------
Readable Unicode (Chinese, in my case) characters (like it is in PHP 7)

Actual result:
--------------
Unreadable characters

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-03-15 16:49 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-03-15 16:49 UTC] cmb@php.net
Could you please provide ODBC traces for PHP 7.4 and 8.0; ideally
place them somewhere publicly accessible on the Web (e.g.
gist.github.com, pastebin.com) and only link to them.
 [2021-03-15 17:19 UTC] sergiopaternoster73 at gmail dot com
Please, pardon my ignorance. I don't get any segfault. How can I provide an ODBC trace?
 [2021-03-15 17:42 UTC] sergiopaternoster73 at gmail dot com
I tested it with a newer SQL Server 2017 and a "cleaner" NVARCHAR table. Here the results with PHP 7 and 8. Looks like the encoding is ASCII in PHP 8 while it's UTF-8 in PHP 7

Version: 7.4.13
mb_detect_encoding(): UTF-8
Result: 河北香河经济开发区运河大道东侧安晟街北侧

Version: 8.0.3
mb_detect_encoding(): ASCII
Result: ????????????????????
 [2021-03-15 17:47 UTC] cmb@php.net
An ODBC trace is not related to segfaults (you probably mean a
stack backtrace).  Anyway, since I'm using Windows I'm not sure
how to get an ODBC trace on your platform, but the FreeTDS User
Guide has a chapter on logging[1]; probably the last section on
this page is relevant.

See <https://bugs.php.net/80868#1615816567> for an example of an
ODBC trace.

<https://www.freetds.org/userguide/logging.html>
 [2021-03-15 18:08 UTC] sergiopaternoster73 at gmail dot com
Thank you for the guidance. Here the traces:

php-7.4
https://gist.github.com/capsandiego/8f2176ebc29ab961612bb225f06f452e

php-8.0.3
https://gist.github.com/capsandiego/92d963c3fa40bfcf8861bc46a2d33f35
 [2021-03-15 20:53 UTC] cmb@php.net
-Status: Feedback +Status: Open -Assigned To: cmb +Assigned To:
 [2021-03-15 20:53 UTC] cmb@php.net
Thank you for the traces!  These are actually quite helpful,
because the first lines show a difference in the setup or build
(7.4 uses ISO-8859-1, while 8.0 uses UTF-8).  I don't know
FreeTDS, so I have no idea why they are using different client
charsets.  Consider to ask on a FreeTDS support channel, too.
 [2021-03-16 07:22 UTC] sergiopaternoster73 at gmail dot com
Thanks for checking. It looks the opposite to me, am I wrong?
 
PHP 8 uses ISO-8859-1 
iconv.c:330:tds_iconv_open(0x3a6c880, ISO-8859-1)

while PHP 7.4 UTF-8
iconv.c:330:tds_iconv_open(0x2754e40, UTF-8)

I will anyway try to compile the latest version of FreeTDS to see if this will improve.
 [2021-03-16 10:14 UTC] sergiopaternoster73 at gmail dot com
Quick question. 
I compile php with --with-pdo_odbc=unixODBC,/usr and --with-unixODBC=/usr

Pardon my ignorance again, shall we focus on FreeTDS or unixODBC to look for a solution?
 [2021-03-16 12:08 UTC] sergiopaternoster73 at gmail dot com
I have recompiled php 8 with the latest version of FreeTDS but the problem persists. Do you have any other suggestions to query SQL Server from PHP on Linux and preserve UTF-8 chars?
 [2021-03-16 12:26 UTC] sergiopaternoster73 at gmail dot com
These are the new traces with FreeTDS 1.2 (from 0.9 before).
Now they look more consistent than before. So why the problem persists?

php 7 trace:
https://gist.github.com/capsandiego/18b2cbb5f6f3e518f508c9b16335b513

php 8 trace:
https://gist.github.com/capsandiego/d51367c34d506d37aa6c62b46d71e1b3
 [2021-03-16 16:11 UTC] sergiopaternoster73 at gmail dot com
I tested it on Windows (PHP 7.2.5 and PHP 8.0.3). 

SELECT NAME FROM TABLE; (NAME is defined as NVARCHAR(200))

They both work *ONLY* if I set the PDO::ODBC_ATTR_ASSUME_UTF8 => true as option. Otherwise, I will always get an ASCII when I read an NVARCHAR() column.
How come do I need to specify PDO::ODBC_ATTR_ASSUME_UTF8 ? The result is supposed to be a UTF-8.

Thanks for your help
 [2021-03-17 17:17 UTC] sergiopaternoster73 at gmail dot com
Unfortunately, even with PHP 8.1.0-dev and the latest install for FreeTDS and unixODBC, it doesn't work. 

Is there anyone who can run a SQL Server query with NVARCHAR columns and see Unicode characters with PHP 8?
 [2021-03-18 08:38 UTC] sergiopaternoster73 at gmail dot com
Finally, after a while, I discovered that PHP 8 requires ClientCharset = UTF-8 to be specified in the odbc.ini file in the ODBC connection parameters. PHP 7 doesn't require it.

Thank you for your help. Please, close this issue.
 [2021-03-18 13:30 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2021-03-18 13:30 UTC] cmb@php.net
> How come do I need to specify PDO::ODBC_ATTR_ASSUME_UTF8 ? The
> result is supposed to be a UTF-8.

You do not necessarily need to specify that, but it can help with
some drivers.  See the constant's documentation[1], where i just
updated the last sentence to read:

| If false (the default), character encoding conversion may be
| done by the driver.

Anyhow, great that you figured out a working solution! 

[1] <https://www.php.net/manual/en/ref.pdo-odbc.php#pdo-odbc.constants>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 00:01:32 2024 UTC