php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80874 odbc_fetch_array() will not fetch a row containing a UTF-8 currency code symbol
Submitted: 2021-03-16 18:11 UTC Modified: 2021-09-06 11:58 UTC
From: tony at tonymarston dot net Assigned:
Status: Open Package: ODBC related
PHP Version: 7.4.16 OS: Windows 10
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
41 - 19 = ?
Subscribe to this entry?

 
 [2021-03-16 18:11 UTC] tony at tonymarston dot net
Description:
------------
I have a database table in a SAP HANA database which contains currency codes and their symbols, but when I try to read them in my PHP program the rows containing symbols are not returned in the result set.

Test script:
---------------
CREATE TABLE currency (
  currency_code nvarchar(16) NOT NULL,
  currency_name nvarchar(255) NOT NULL,
  currency_symbol nvarchar(4) DEFAULT NULL,
  PRIMARY KEY (currency_code)
);

INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('AUD', 'Australian Dollars', NULL);
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('CAD', 'Canadian Dollars', NULL);
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('EUR', 'Euros', '€');
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('GBP', 'UK Pounds', '£');
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('JPY', 'Japanese Yen', '¥');
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('SGD', 'Singapore Dollar', '$');
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('THB', 'Thai Baht', '฿');
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('USD', 'US Dollars', '$');
INSERT INTO currency (currency_code, currency_name, currency_symbol) VALUES ('YUAN', 'Chinese Yuan Renminbi', '元');

<?php
$driver = 'HDBODBC';
$server = 'hxehost:39015';
$user   = '****';
$pswd   = '****';
$conn = odbc_connect("Driver=$driver;ServerNode=$server",$user,$pswd) OR die('Connection to DB via ODBC failed');
$query = "SELECT * FROM test.currency ORDER BY currency_code";
if (!$result = odbc_exec($conn, $query)) {
    die("ERRORNO=".odbc_error().", ERRORMSG=".odbc_errormsg());
};
$numrows = odbc_num_rows($result);
while ($row = odbc_fetch_array($result)) {
    $array[] = array_change_key_case($row, CASE_LOWER);
    echo "<p>id={$row['CURRENCY_CODE']}, name={$row['CURRENCY_NAME']}, symbol={$row['CURRENCY_SYMBOL']}</p>\n";
} // while
odbc_free_result($result);
?>

Expected result:
----------------
The call to odbc_num_rows() returns '9' which is correct, but the call to odbc_fetch_array() only returns the first 2 rows and returns FALSE for the 3rd row.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-03-17 16:02 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-03-17 16:02 UTC] cmb@php.net
> […] but the call to odbc_fetch_array() only returns the first 2
> rows and returns FALSE for the 3rd row.

Doesn't it also raise a warning?

Anyhow, I cannot reproduce this (I haven't a SAP HANA DB
available, though), so please provide an ODBC trace so we can see
what's actually happening.  If possible, put it somewhere on the
Web (e.g. gist.github.com, pastebin.com) and only post a link to
the trace here.
 [2021-03-17 18:27 UTC] tony at tonymarston dot net
-Status: Feedback +Status: Assigned
 [2021-03-17 18:27 UTC] tony at tonymarston dot net
> Doesn't it also raise a warning?
How do I detect that? There is no ODBC function to retrieve warnings.

The trace log can be downloaded from http://www.tonymarston.net/odbc.log

I have also tried running the same test on my SQL Server database, but while it does return all 9 rows the currency_symbol column for 'THB' and 'YUAN' are being returned as '?' instead of the expected UTF-8 symbol.
 [2021-03-18 13:03 UTC] cmb@php.net
-Status: Assigned +Status: Feedback
 [2021-03-18 13:03 UTC] cmb@php.net
>> Doesn't it also raise a warning?
> How do I detect that? There is no ODBC function to retrieve
> warnings.

I was referring to the general PHP error reporting mechanisms.

Anyway, the ODBC trace shows whats going on:

| General error;-10427 Conversion of parameter/column (3) from
| data type NVARCHAR to ASCII failed (-10427)

Conversion to ASCII can't succeed, but PHP does not enforce that,
but rather requests binding to SQL_C_CHAR.  Is there a respective
setting to convert to UTF-8 in the driver options?

With SQL Server (ODBC Driver 17 for SQL Server) all currency
symbols are properly retrieved and displayed for me.  Watch out
for font issues (call bin2hex() to see the byte values).
 [2021-03-18 15:28 UTC] tony at tonymarston dot net
-Status: Feedback +Status: Assigned
 [2021-03-18 15:28 UTC] tony at tonymarston dot net
No notice or warning is generated on the call to odbc_fetch_array().

I have CHAR_AS_UTF8=TRUE in the additional connection properties in the ODBC Data Source Administrator.

When I access my SQL Server database via ODBC the currency symbol returned for THB appears as '?' in the PHP array and also when I display it in the HTML output. However, when I copy it to my text editor it is shown as '?' in ASCII and 'EFBBBF3F' in hexadecimal. Why does it show up as '?'
 [2021-03-18 16:45 UTC] cmb@php.net
-Status: Assigned +Status: Feedback
 [2021-03-18 16:45 UTC] cmb@php.net
> […] and 'EFBBBF3F' in hexadecimal. Why does it show up as '?'

Because that bytes are an UTF-8 BOM followed by a question mark.
Displaying this as ? is correct, but I wonder where that BOM comes
from.

> I have CHAR_AS_UTF8=TRUE in the additional connection properties
> in the ODBC Data Source Administrator.

Did you try with CHAR_AS_UTF8=FALSE?

Also, I suggest that you check whether that would work with
PDO_ODBC.  I assume that you'd get the same results as with ODBC by
default, but setting the PDO::ODBC_ATTR_ASSUME_UTF8 attribute may
yield the desired results.
 [2021-03-18 18:27 UTC] tony at tonymarston dot net
-Status: Feedback +Status: Assigned
 [2021-03-18 18:27 UTC] tony at tonymarston dot net
> 'EFBBBF3F' in hexadecimal is a UTF-8 BOM followed by a question mark.

Byte Order Marks should *NEVER* be added to database columns, only files.

> Did you try with CHAR_AS_UTF8=FALSE?

Yes, but the results were the same.

> would this work with PDO_ODBC?

Accessing the SAP HANA database had the same result - it stopped fetching rows after the 2nd row.

Access the SQL Server database had a different result - the symbol for 'THB' was returned as '?' on its own without the UTF-8 BOM.
 [2021-03-19 13:19 UTC] cmb@php.net
-Status: Assigned +Status: Open -Assigned To: cmb +Assigned To:
 [2021-03-19 13:19 UTC] cmb@php.net
> Byte Order Marks should *NEVER* be added to database columns, […]

I suggest you tell that whoever is responsible for inserting the
BOM.  It certainly is not PHP.  It *might* be the ODBC driver.

Anyhow, if PDO_ODBC with PDO::ODBC_ATTR_ASSUME_UTF8 yields the
same broken results, I don't think this is a PHP issue.  It might
be an issue with the driver, or some configuration issue.  I
wouldn't know where to look further.  Maybe some SAP HANA ODBC
driver support channel can be more helpful.
 [2021-03-20 16:41 UTC] tony at tonymarston dot net
I have tried using the PDO ODBC driver both with and without the PDO::ODBC_ATTR_ASSUME_UTF8 option with mixed results. I have also run tests by accessing the same data in a SQL Server database via the PDO ODBC driver as a comparison.

I have captured the output from various tests which you can download from http://www.tonymarston.net/test-odbc-utf8.zip

TEST-1 uses the ODBC driver using data which was inserted using JDBC (the Eclipse plugin)

TEST-2 uses the ODBC driver using data which was inserted using PHP and the ODBC driver

TEST-3 uses the PDO-ODBC driver without the UTF8 attribute using data which was inserted using JDBC

TEST-4 uses the PDO-ODBC driver without the UTF8 attribute using data which was inserted using PHP

TEST-5 uses the PDO-ODBC driver with the UTF8 attribute using data which was inserted using JDBC

TEST-6 uses the PDO-ODBC driver with the UTF8 attribute using data which was inserted using PHP

These tests clearly show that the HANA ODBC driver mangles UTF8 values both when written and when read.

The test using the PDO-ODBC driver with the UTF8 attribute turned ON produced the worst results as it totally mangled all the non-UTF8 characters.
 [2021-04-21 12:35 UTC] cmb@php.net
> These tests clearly show that the HANA ODBC driver mangles UTF8
> values both when written and when read.

If it is the driver, there is not we can do about that, can't we?
 [2021-04-21 14:38 UTC] tony at tonymarston dot net
I do not know which driver is fault, so I have also reported this bug to the SAP HANA community at https://answers.sap.com/questions/13322291/odbc-fetch-array-will-not-fetch-a-row-containing-a.html, but as they have not tried accessing their database via PHP running on Windows they are unable to reproduce this fault and there remain clueless.

I do not have the resources to try the same code using PHP on Linux, so I don't know if that ODBC driver is any better. All I can say at the moment is that trying to access a SAP HANA database from PHP is not a viable proposition.
 [2021-06-05 08:52 UTC] tony at tonymarston dot net
A developer at SAP has been investigating this problem and has discovered that the PHP odbc module is not formatting the UCS-2 text correctly on Windows for the HANA ODBC driver. You can read his full response at https://answers.sap.com/questions/13231775/hana-utf8-characters-being-garbled-when-using-the.html?childToView=13407210
 [2021-09-06 09:35 UTC] jeremy dot galouye at gmail dot com
Hello, I had the same issue have you found a workaround solution ?
Thanks
 [2021-09-06 10:24 UTC] tony at tonymarston dot net
No, I have not found a workaround for this problem. This means that I cannot use SAP HANA with my multi-currency ERP application.
 [2021-09-06 11:58 UTC] cmb@php.net
> A developer at SAP has been investigating this problem and has
> discovered that the PHP odbc module is not formatting the UCS-2
> text correctly on Windows for the HANA ODBC driver.

Ah, I see.  The basic problem is that PHP has no notion of the
charset encoding of strings.  Thus, it treats all strings as byte
sequences.  Therefore, we can't do any automatic conversion to
UCS-2 and use SQLExecDirectW().  Well, actually we could, at least
on Windows, where PHP 7.1.0 introduced UTF-8 support for file
paths, and the respective framework (sapi_windows_cp_get|set).
However, automatically doing the conversion would almost certainly
break existing code for many other ODBC drivers (which do not
automatically convert to UCS-2), so we would need to make that
optional, and an INI setting might not be a good solution.  Also,
what to do for non Windows systems is unclear.

Anyhow, this would be more like a new feature, and as such can't
target any of the stable branches (PHP 7.4, PHP 8.0).
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Mon Dec 06 23:03:35 2021 UTC