php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71444 Fatal termination of data length occurs in transforming from EBCDIC to UTF-8
Submitted: 2016-01-25 01:43 UTC Modified: 2020-10-06 07:35 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: ta_nakagawa at ysco dot net Assigned:
Status: Open Package: ODBC related
PHP Version: 5.6.17 OS: Windows 7 / Windows 2012 R2
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ta_nakagawa at ysco dot net
New email:
PHP Version: OS:

 

 [2016-01-25 01:43 UTC] ta_nakagawa at ysco dot net
Description:
------------
When the query returns the UTF-8 multibyte character (such as Japanese characters) 
from a AS/400 EBCDIC data field via ODBC,
the last some characters is terminated in the ODBC module of PHP.

The conversion from EBCDIC to UTF-8 increases the size of the characters.
The data size of UTF-8 becomes 2.5 times in the case of the size maximizing.
ODBC driver (DB2CLI or IBM i access for windows) converts to the UTF-8
characters correctly,
but the PHP does not have the buffer storing the incremental size for the conversion.
As a result the last some characters is dropped.

For example,
in AS/400 the data field by 20 bytes can store 9 EBCDIC double byte characters (20 bytes)
such as Japanese character.
It becomes 27 bytes in UTF-8.
When PHP receives the data from the driver, 
the size of the variable assigned the data is not sufficient.
So the last 3 characters corresponding to 7 bytes are lost.

This problem can be avoid by extra allocation to the variable of the characters
within the function "odbc_bindcols" in the source "\ext\odbc\php_odbc.c"
It needs to remove the conditional statement for extra allocation of character.
We hope that the problem in an appropriate manner is resolved.

Thank you for your consideration.


diff -u ext/odbc/php_odbc.c.orig  ext/odbc/php_odbc.c

--- ext/odbc/php_odbc.c.orig    2016-01-06 07:14:48.000000000 +0900
+++ ext/odbc/php_odbc.c 2016-01-21 22:16:59.767514900 +0900
@@ -1020,10 +1020,9 @@
                                        displaysize += 3;
                                }

-                               if (charextraalloc) {
-                                       /* Since we don't know the exact # of bytes, allocate extra */
-                                       displaysize *= 4;
-                               }
+                               /* Since we don't know the exact # of bytes, allocate extra */
+                               displaysize *= 4;
+
                                result->values[i].value = (char *)emalloc(displaysize + 1);
                                rc = SQLBindCol(result->stmt, (SQLUSMALLINT)(i+1), SQL_C_CHAR, result->values[i].value,
                                                        displaysize + 1, &result->values[i].vallen);



Test script:
---------------
$conn = odbc_connect("MyDSN", "MyUser", "MyPWD");
$query = 'SELECT TKBANG,TKNAKJ FROM TOKMSP';
$stmt = odbc_prepare($conn, $query);
$ret = odbc_execute($stmt);
while($record = odbc_fetch_array($ret) ) print_r($record);
odbc_close($conn);


Expected result:
----------------
Array
(
    [TKBANG] => 00121
    [TKNAKJ] => たちつてとなにぬね
)

The character of [TKNAKJ] is Japanese.
The data has already been inserted to the column.


Actual result:
--------------
Array
(
    [TKBANG] => 00121
    [TKNAKJ] => たちつてとな 
)

The last 3 characters are terminated.

Patches

Extra_allocation_for_transforming_character_code (last revision 2016-01-25 01:45 UTC by ta_nakagawa at ysco dot net)

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-02-23 14:04 UTC] ab@php.net
-Status: Open +Status: Feedback
 [2016-02-23 14:04 UTC] ab@php.net
Thanks for the report. Could you please extend your code with the table creation and the relevant data?

Thanks.
 [2016-02-29 01:26 UTC] ta_nakagawa at ysco dot net
-Status: Feedback +Status: Open
 [2016-02-29 01:26 UTC] ta_nakagawa at ysco dot net
Thank you for your reply.

A data of EBCDIC multibyte chacacter is needed to insert to the column.
We prepared the binary image containing the data.

http://www.ysco.net/phpbug/tokmsp_saved

You can make the sample data for AS/400 system as below.

1. DDL of the table creation for AS/400 system.

CREATE TABLE TOKMSP(TKBANG CHAR (5), TKNAKJ CHAR (20) CCSID 1399)

2. Setup the sample data to AS/400 system in the following stes.

The binary image file "tokmsp_saved" contains the sample data.

1) Execute AS/400 command "CRTSAVF QGPL/SAVF".
2) Execute FTP put command by binary image to AS/400 system as below.
   "binary"
   "put tokmsp_saved QGPL/SAVF"
3) Execute AS/400 command "RSTOBJ OBJ(TOKMSP) SAVLIB(QGPL) DEV(*SAVF) SAVF(QGPL/SAVF)".

Thanks.
 [2016-03-01 14:34 UTC] ab@php.net
-Status: Open +Status: Verified
 [2016-03-01 14:34 UTC] ab@php.net
Thanks for the extended information.

Unfortunately I have no access to any AS400 machines to test and debug. I hoped to be able to reproduce this with some other drivers like SqlServer, but it seems not passible with another driver with the current snippet because it knows nothing about the EBCDIC charsets :(

After some analyze I actually see the issue now from the table creation statement. The column where the EBCDIC data is stored is of type CHAR, and from the specification it doesn't seem to diverge from the specs https://www-01.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.odbc/src/tpc/db2z_csql.dita . What i currently don't see and you maybe could point this out - where does the conversion to UTF-8 happen? Does that require some additional options to be done automatically, like when reading the data? As currently, if you look at php_odbc_fetch_hash() - it's always fetched as SQL_C_CHAR, so there should be no conversions to UTF-8 or whatever else.

IMO your patch suggestion is not quite appropriate. As you can read from the code, the charextraalloc is exactly for handling the Unicode columns. If the column is not unicode, thus it's a C string and doesn't need any conversion. If we remove this condition, that will mean for any even non multibyte strings the allocation size will be quadrupled. That doesn't sound right.

I haven't dig deep in the IBM specifications, but it could be well something DB2 specific. I see a couple of places in the code currently conditioned with HAVE_IBMDB2. IMO the correct fix sohuld recognize the need on extra space and set the charextraalloc = 1; accordingly. Maybe reading the column charset or other driver specific data?

Thanks.
 [2016-03-01 14:34 UTC] ab@php.net
-Status: Verified +Status: Feedback
 [2016-03-01 14:34 UTC] ab@php.net
Ups, wrong status
 [2016-03-12 12:47 UTC] ta_nakagawa at ysco dot net
-Status: Feedback +Status: Open
 [2016-03-12 12:47 UTC] ta_nakagawa at ysco dot net
Thanks for your comment.

It is difficult to determine whether the charset of the column of type CHAR is EBCDIC or not. (But A character of charset EBCDIC is usualy inserted a column of type CHAR.) We cannot get the charset information from the driver via ODBC to AS/400. I also think that the fix using the condition with HAVE_IBMDB2 is appropriate.

When a user faces a similar situation with our case, a official PHP binary does not solve the problem. It is needed the PHP compiled with the configure option "--with-ibm-db2". Could a php user get a guidance to introduce PHP connecting to DB2, such as described above?
 [2016-04-18 15:36 UTC] ta_nakagawa at ysco dot net
What's the status now?
We think the appropriate solution for this problem was selected.
When a maintainer is assigned, Could you deal with the problem?
 [2020-10-05 12:47 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2020-10-05 12:47 UTC] cmb@php.net
If you still experience this issue with any of the actively
supported PHP versions[1], please provide an ODBC trace of running
the given test script.

[1] <https://www.php.net/supported-versions.php>
 [2020-10-06 03:52 UTC] ta_nakagawa at ysco dot net
-Status: Feedback +Status: Assigned
 [2020-10-06 03:52 UTC] ta_nakagawa at ysco dot net
Unfortunately, I don't have an environment to test using AS400.
So I can't test the problem with the current supported version.
However, if the code associated with this issue hasn't changed, the same issue likely remains.

Thanks.
 [2020-10-06 07:35 UTC] cmb@php.net
-Status: Assigned +Status: Open -Assigned To: cmb +Assigned To:
 [2020-10-06 07:35 UTC] cmb@php.net
Thanks for the swift feedback!

I see the problem, but I'm not happy with the suggested solution
of generally allocating large strings; I'd rather find a more
specialized solution.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Tue Oct 20 06:01:26 2020 UTC