php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80869 SQL Server: Read NVARCHAR Unicode
Submitted: 2021-03-15 16:25 UTC Modified: 2021-03-18 13:30 UTC
From: sergiopaternoster73 at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PDO ODBC
PHP Version: 8.0.3 OS: Ubuntu 18.04 Server
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: sergiopaternoster73 at gmail dot com
New email:
PHP Version: OS:

 

 [2021-03-15 16:25 UTC] sergiopaternoster73 at gmail dot com
Description:
------------
I have an SQL Server 2012 table that contains all NVARCHAR columns.
With PHP 7.3/7.4 I can read both ASCII and UNICODE characters from the table by running this query: SELECT CAST([NAME] AS NVARCHAR(MAX)) AS [Name] FROM [Table].
With php-8.0.3 I can still read ASCII results (plain English, for example) but not UNICODE anymore (Chinese characters, in my case).

I use php-8.0.3 with FreeTDS on Linux Ubuntu. In PHP 7.3/7.4 works well!




Test script:
---------------
```
$query = "SELECT
          CAST([NAME] AS NVARCHAR(MAX)) AS [Name]
          FROM [Table]";
$pdo      = new PDO("odbc:MyConnection","MyUser","MyPassword");
$rs       = $pdo->query($query);
$result   = $rs->fetchColumn();
file_put_contents("result.txt",$result);
$pdo = null;
```

/etc/odbc.ini settings

[MyConnection]
Description       = "Data Warehouse"
Port              = 1433
Driver            = FreeTDS
Description       = SQL Server
Trace             = No
Server            = my.server.com
Database          = AB
TDS_Version       = 8.0
Client_Charset    = UTF-8
ApplicationIntent = ReadOnly

Expected result:
----------------
Readable Unicode (Chinese, in my case) characters (like it is in PHP 7)

Actual result:
--------------
Unreadable characters

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-03-15 16:49 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-03-15 16:49 UTC] cmb@php.net
Could you please provide ODBC traces for PHP 7.4 and 8.0; ideally
place them somewhere publicly accessible on the Web (e.g.
gist.github.com, pastebin.com) and only link to them.
 [2021-03-15 17:19 UTC] sergiopaternoster73 at gmail dot com
Please, pardon my ignorance. I don't get any segfault. How can I provide an ODBC trace?
 [2021-03-15 17:42 UTC] sergiopaternoster73 at gmail dot com
I tested it with a newer SQL Server 2017 and a "cleaner" NVARCHAR table. Here the results with PHP 7 and 8. Looks like the encoding is ASCII in PHP 8 while it's UTF-8 in PHP 7

Version: 7.4.13
mb_detect_encoding(): UTF-8
Result: 河北香河经济开发区运河大道东侧安晟街北侧

Version: 8.0.3
mb_detect_encoding(): ASCII
Result: ????????????????????
 [2021-03-15 17:47 UTC] cmb@php.net
An ODBC trace is not related to segfaults (you probably mean a
stack backtrace).  Anyway, since I'm using Windows I'm not sure
how to get an ODBC trace on your platform, but the FreeTDS User
Guide has a chapter on logging[1]; probably the last section on
this page is relevant.

See <https://bugs.php.net/80868#1615816567> for an example of an
ODBC trace.

<https://www.freetds.org/userguide/logging.html>
 [2021-03-15 18:08 UTC] sergiopaternoster73 at gmail dot com
Thank you for the guidance. Here the traces:

php-7.4
https://gist.github.com/capsandiego/8f2176ebc29ab961612bb225f06f452e

php-8.0.3
https://gist.github.com/capsandiego/92d963c3fa40bfcf8861bc46a2d33f35
 [2021-03-15 20:53 UTC] cmb@php.net
-Status: Feedback +Status: Open -Assigned To: cmb +Assigned To:
 [2021-03-15 20:53 UTC] cmb@php.net
Thank you for the traces!  These are actually quite helpful,
because the first lines show a difference in the setup or build
(7.4 uses ISO-8859-1, while 8.0 uses UTF-8).  I don't know
FreeTDS, so I have no idea why they are using different client
charsets.  Consider to ask on a FreeTDS support channel, too.
 [2021-03-16 07:22 UTC] sergiopaternoster73 at gmail dot com
Thanks for checking. It looks the opposite to me, am I wrong?
 
PHP 8 uses ISO-8859-1 
iconv.c:330:tds_iconv_open(0x3a6c880, ISO-8859-1)

while PHP 7.4 UTF-8
iconv.c:330:tds_iconv_open(0x2754e40, UTF-8)

I will anyway try to compile the latest version of FreeTDS to see if this will improve.
 [2021-03-16 10:14 UTC] sergiopaternoster73 at gmail dot com
Quick question. 
I compile php with --with-pdo_odbc=unixODBC,/usr and --with-unixODBC=/usr

Pardon my ignorance again, shall we focus on FreeTDS or unixODBC to look for a solution?
 [2021-03-16 12:08 UTC] sergiopaternoster73 at gmail dot com
I have recompiled php 8 with the latest version of FreeTDS but the problem persists. Do you have any other suggestions to query SQL Server from PHP on Linux and preserve UTF-8 chars?
 [2021-03-16 12:26 UTC] sergiopaternoster73 at gmail dot com
These are the new traces with FreeTDS 1.2 (from 0.9 before).
Now they look more consistent than before. So why the problem persists?

php 7 trace:
https://gist.github.com/capsandiego/18b2cbb5f6f3e518f508c9b16335b513

php 8 trace:
https://gist.github.com/capsandiego/d51367c34d506d37aa6c62b46d71e1b3
 [2021-03-16 16:11 UTC] sergiopaternoster73 at gmail dot com
I tested it on Windows (PHP 7.2.5 and PHP 8.0.3). 

SELECT NAME FROM TABLE; (NAME is defined as NVARCHAR(200))

They both work *ONLY* if I set the PDO::ODBC_ATTR_ASSUME_UTF8 => true as option. Otherwise, I will always get an ASCII when I read an NVARCHAR() column.
How come do I need to specify PDO::ODBC_ATTR_ASSUME_UTF8 ? The result is supposed to be a UTF-8.

Thanks for your help
 [2021-03-17 17:17 UTC] sergiopaternoster73 at gmail dot com
Unfortunately, even with PHP 8.1.0-dev and the latest install for FreeTDS and unixODBC, it doesn't work. 

Is there anyone who can run a SQL Server query with NVARCHAR columns and see Unicode characters with PHP 8?
 [2021-03-18 08:38 UTC] sergiopaternoster73 at gmail dot com
Finally, after a while, I discovered that PHP 8 requires ClientCharset = UTF-8 to be specified in the odbc.ini file in the ODBC connection parameters. PHP 7 doesn't require it.

Thank you for your help. Please, close this issue.
 [2021-03-18 13:30 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2021-03-18 13:30 UTC] cmb@php.net
> How come do I need to specify PDO::ODBC_ATTR_ASSUME_UTF8 ? The
> result is supposed to be a UTF-8.

You do not necessarily need to specify that, but it can help with
some drivers.  See the constant's documentation[1], where i just
updated the last sentence to read:

| If false (the default), character encoding conversion may be
| done by the driver.

Anyhow, great that you figured out a working solution! 

[1] <https://www.php.net/manual/en/ref.pdo-odbc.php#pdo-odbc.constants>
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 10:01:29 2025 UTC