php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #25972 ODBC truncates multi-byte text (w/ MSSQL)
Submitted: 2003-10-23 23:26 UTC Modified: 2017-01-08 06:18 UTC
Votes:11
Avg. Score:4.3 ± 1.0
Reproduced:8 of 8 (100.0%)
Same Version:6 (75.0%)
Same OS:5 (62.5%)
From: phpbug at chipple dot net Assigned: krakjoe (profile)
Status: Closed Package: ODBC related
PHP Version: 4.3, 5 OS: Win2K 5.00.2195 SP4
Private report: No CVE-ID: None
 [2003-10-23 23:26 UTC] phpbug at chipple dot net
Description:
------------
This bug has been observed with PHP 4.3.3 and 4.3.4RC2.
Database: MSSQL 2000 (8.00.760) SP3

I have a MSSQL database with a table containing a column tTitle of type nvarchar(80) (which stands for 80 multi-byte characters).

When a string of 60 Japanese double-byte characters (120 bytes) stored in column tTitle is retrieved using PHP's ODBC extension, the value is truncated to 80 bytes.

The PHP MSSQL extension retrieves the data correctly.
Microsoft's ODBC driver (used from ASP) retrieves the data correctly.


MSSQL table structure:

CREATE TABLE [dbo].[T_Course] (
  [aCourseID] [int] IDENTITY (1, 1) NOT NULL ,
  [tTitle] [nvarchar] (80) COLLATE Japanese_CI_AS NOT NULL
) ON [PRIMARY]
GO


Test data (CSV):
aCourseID,tTitle
1,[string of 60 Japanese double-byte characters]


SQL query:

SELECT * FROM T_Course WHERE aCourseID=1

Reproduce code:
---------------
// ODBC EXTENSION, data truncated

$oConn = odbc_connect(C_Gen_sDbDSN,C_Gen_sDbUser,C_Gen_sDbPassword);
$oRs = odbc_exec($oConn,"SELECT * FROM T_Course WHERE aCourseID=1");
$aRow = odbc_fetch_array($oRs);

// BAD: Title truncated to 80 _bytes_
echo $aRow["tTitle"];

odbc_close($oConn);


// MSSQL EXTENSION, data retrieved correctly

$oConn = odbc_connect(C_Gen_sDbDSN,C_Gen_sDbUser,C_Gen_sDbPassword);
$oRs = odbc_exec($oConn,"SELECT * FROM T_Course WHERE aCourseID=1");
$aRow = odbc_fetch_array($oRs);

// GOOD: Complete title retrieved (60 chars=120 bytes)
echo $aRow["tTitle"];

odbc_close($oConn);

Expected result:
----------------
The ODBC extension should truncate the retrieved data to 80 characters (instead of 80 bytes).

Actual result:
--------------
The ODBC extension truncates the retrieved data to 80 bytes.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-10-24 00:10 UTC] phpbug at chipple dot net
Sorry here's the MSSQL extension code that should have been in the 2nd part of "Reproduce code".

// MSSQL EXTENSION, data retrieved correctly

$oConn = mssql_connect("localhost",C_Gen_sDbUser,C_Gen_sDbPassword);
mssql_select_db("icds",$oConn);
$oRs = mssql_query("SELECT * FROM T_Course WHERE aCourseID=1");
$aRow = mssql_fetch_array($oRs);

// GOOD: Complete title retrieved (60 chars=120 bytes)
echo $aRow["tTitle"];

mssql_close($oConn);
 [2003-10-31 09:26 UTC] kalowsky@php.net
Is this really a bug?  The debate rages like so:

One the one hand ODBC does not support multi-byte 
characters at all.  On the other, the idea of multi-
byte characters didn't exist (or if it did, not very 
prevailent) at the time of the original authoring of 
the ODBC system.

Next question is, is there an easy fix.  Nope.  I've 
had a fix in the works, I've just lost all time to work 
upon it.  Marking as verified, but not really sure it's 
a bug.  Maybe a feature request really... 

Following the ODBCv2 specs, NVARCHAR is a type that 
doesn't exist, nor does TEXT, NTEXT, and many of the 
other wonderful types found today.  
 [2003-11-03 17:13 UTC] moriyoshi@php.net
I might be wrong, but I think this is a valid bug.

According to the documentation on msdn [1] [2], the fifth parameter of SQLBindCol() expects the size of the buffer in byte, while the code mentioned below appears to give it the number of characters allocated for the column (display size) instead. This kind of confusion will most likely cause unexpected behaviour like described in this report.

php_odbc.c 669:
------------------------------------------------------
default:
  rc = SQLColAttributes(result->stmt, (UWORD)(i+1), SQL_COLUMN_DISPLAY_SIZE, NULL, 0, NULL, &displaysize);
  displaysize = displaysize <= result->longreadlen ? displaysize : result->longreadlen;
  result->values[i].value = (char *)emalloc(displaysize + 1);
  rc = SQLBindCol(result->stmt, (UWORD)(i+1), SQL_C_CHAR, result->values[i].value, displaysize + 1, &result->values[i].vallen);
  break;
------------------------------------------------------

If the driver supports ODBCv3, we should use SQL_DESC_OCTET_LENGTH, which can be used to determine the actual number of bytes. If not, we may be able to safely calculate the maximum column size by simply multiplying the display size by 2, as NVARCHAR columns are known to not contain multi-byte strings, but double-byte strings.

[1] http://msdn.microsoft.com/library/en-us/odbc/htm/odch04pr_13.asp?frame=true
[2] http://msdn.microsoft.com/library/en-us/odbc/htm/odappdpr_24.asp?frame=true


 [2003-11-04 07:56 UTC] kalowsky@php.net
moriyoshi,

It's not as simple as you show it to be.  First you must 
realize that PHP's ODBC layer is written as an ODBC v2 
compliant system, to just randomly add in support for 
NVARCHAR (and friends) will break support for other 
database systems.  

The point of my post wasn't to say this isn't a bug 
(hence why I marked it as verified), but rather to say 
it's a known bug and the issue is the extension is in 
need of updating.  
 [2003-11-04 09:31 UTC] moriyoshi@php.net
Well, then how did you conclude NVARCHAR support will break some kinds of compatibilities? I think I have already pointed out that we'd still be able to handle it on ODBCv2. 
(Sorry if this sounds offending. I don't mean so :)

Basically we don't have to check whether the column type is NVARCHAR or not, but just allocate enough space for that type of characters. That way we also got to take a slight loss of memory into account though.

 [2003-11-04 18:33 UTC] kalowsky@php.net
I do not like the idea of introducing ODBCv3 based code/
options to a system predominately defined by ODBCv2 
specifications.  So I am against the inclusion of 
Moriyoshi's initial suggested fix for this.

That being said, I did a bit more research on this today 
and think the following change should allow the double 
wide characters to work much better.  I haven't tested 
it out yet myself, but if someone else has the time, it 
would be beneficial to all.  Sorry about the bug system 
mangling.

Essentially the SQL_COLUMNS_DISPLAY_SIZE lists the 
number of characters needed to display everything.  This 
works fine but in the case of a double wide character 
array it doesn't (as explained my Moriyoshi).  The 
SQL_COLUMN_LENGTH should return the number of bytes 
necessary for retrival of the column.  

WARNING: this change may fundamentally alter the 
functionality of the longreadlen variable comparisons as 
well. Use at your own risk right now.



Index: php_odbc.c
========================================================
===========
RCS file: /repository/php-src/ext/odbc/php_odbc.c,v
retrieving revision 1.176
diff -r1.176 php_odbc.c
671,672c671,672
<                               rc = 
SQLColAttributes(result->stmt, (UWORD)(i+1), 
SQL_COLUMN_DISPLAY_SIZE,
<                                                                       
NULL, 0, NULL, &displaysize);
---
>                               rc = 
SQLColAttributes(result->stmt, (UWORD)(i+1),
>                         SQL_COLUMN_LENGTH, NULL, 0, 
NULL, &displaysize);
 [2003-11-05 01:18 UTC] phpbug at chipple dot net
Thank you very much for the attention to my bug report.

I gave the fix a try in my environment but then all field values received are empty (details below).

Perhaps the SQL_COLUMN_LENGTH attribute always contains the value 0?

// Test code

$oOdbcConn = odbc_connect(C_Gen_sDbDSN,C_Gen_sDbUser,C_Gen_sDbPassword);
$oOdbcRs = odbc_exec($oOdbcConn,$sSql);
$aOdbcRow = odbc_fetch_array($oOdbcRs);
for ($i = 1; $i <= odbc_num_fields($oOdbcRs); $i++)
  echo odbc_field_name($oOdbcRs,$i).": ".
       odbc_field_len($oOdbcRs,$i).": ".
       strlen($aOdbcRow[odbc_field_name($oOdbcRs,$i)]).": ".
       gettype($aOdbcRow[odbc_field_name($oOdbcRs,$i)])."<br>";

// Result with php4-STABLE-200311050430 before change
// (SQL_COLUMN_DISPLAY_SIZE)

aCourseID: 10: 1: string
tTitle: 80: 86: string

// Result with php4-STABLE-200311050430 after change
// (SQL_COLUMN_LENGTH)

aCourseID: 10: 0: NULL
tTitle: 80: 0: NULL
 [2003-11-05 02:45 UTC] phpbug at chipple dot net
In case this helps...

One thing I just noticed from my test above (in the results before the change) is that strlen() on the tTitle field value gives 86 [bytes], the string's correct length. I verified that the 6 last bytes are all null (ASCII 0), only the first 80 bytes are correctly being returned.
 [2010-12-01 16:16 UTC] jani@php.net
-Package: Feature/Change Request +Package: ODBC related
 [2011-11-02 14:30 UTC] j dot faithw at yahoo dot com
I have the same problem with php 5.3.8 using PostgreSQL with char columns. If the database is created with e.g the EUC_CN character encoding(createdb -E EUC_CN).

This encoding uses between 1 and 3 bytes per character. So a char(10) could need up to 30 bytes. 

The problem is in the odbc_bindcols function in ext/odbc/php_odbc.c
  SQLColAttributes is called with SQL_COLUMN_DISPLAY_SIZE but this indicates the maximum number of characters required not the number of bytes.
This means the buffer allocated for the value may not be big enough
  result->values[i].value=(char)emalloc(displaysize+1);

Later on in e.g. odbc_fetch_into
  Z_STRLEN_P(tmp) = result->values[i].vallen;
  Z_STRVAL_P(tmp) = estrndup(result->values[i].value,Z_STRLEN_P(tmp));

This can result in a vallen bigger that displaysize. But the ODBC driver will only fill in at most displaysize+1 bytes(including null terminator). This means character data is missed and junk bytes are returned instead.

The same problem may exist in ext/pdo_odbc/odbc_stmt.c. Where 
  rc = SQLColAttribute(S->stmt, colno+1, SQL_DESC_DISPLAY_SIZE,
            NULL, 0, NULL, &displaysize);
is called. But I have not tested this.

The following fixes odbc_bindcols for the char(x) datatype. I believe 4 bytes is the maximum required for any charater encoding.
php_odbc.c:line 988
      if (result->values[i].coltype == SQL_CHAR) {
        //If using a multibyte character encoding
        //number of bytes could be 4*SQL_COLUMN_DISPLAY_SIZE.
        //Without this workaround various functions
        //e.g. odbc_fetch_into will return data with a null after
        //diplaysize bytes and extra junk data at the end as
        //vallen can be bigger than displaysize. Tested using
        //PostgreSQL with EUC_CN encoding.
        displaysize*=4;
      }

The fix may be needed for other data types as well as SQL_CHAR.
 [2017-01-08 06:18 UTC] krakjoe@php.net
-Status: Analyzed +Status: Closed -Assigned To: +Assigned To: krakjoe
 [2017-01-08 06:18 UTC] krakjoe@php.net
This appears to be another issue that is part of a wider problem: ODBC is not well maintained, and or up-to-date, and apparently it has lagged behind for more than a decade.

Worth mentioning that there are better drivers available in PHP today, and it seems likely that people are choosing to use those.

If anyone thinks I'm wrong to close this, please open a PR to improve ODBC in general.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 17:01:30 2024 UTC