php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #32140 NVARCHAR2 columns are truncated
Submitted: 2005-02-28 23:08 UTC Modified: 2005-09-08 11:47 UTC
From: bmr at comtime dot com Assigned: tony2001 (profile)
Status: Closed Package: OCI8 related
PHP Version: 5.0.4, 4.3.11 OS: Linux
Private report: No CVE-ID: None
 [2005-02-28 23:08 UTC] bmr at comtime dot com
Description:
------------
NVARCHAR2 columns can be used to store UTF-8 or UTF-16 characters.  When using UTF-8 it can take 3 (or more??) bytes to represent one character.  PHP truncates strings that have byte representations longer than 2 times the character length.    This is actually an Oracle bug because the OCI call returns the wrong byte length.

Here's my patch:
--- php-4.3.10/ext/oci8/oci8.c  Wed Nov  3 08:35:56 2004
+++ php-4.3.10.cti/ext/oci8/oci8.c      Mon Feb 28 15:37:54 2005
@@ -1443,6 +1443,7 @@
        ub4 iters;
        ub4 colcount;
        ub2 dynamic;
+       ub1 charset_form;
        int dtype;
        dvoid *buf;
        oci_descriptor *descr;
@@ -1573,6 +1574,21 @@
                                return 0; /* XXX we loose memory!!! */
                        }

+                       if(outcol->data_type == SQLT_CHR) {
+                                CALL_OCI_RETURN(error, OCIAttrGet(
+                                                                (dvoid *)param,
+                                                                OCI_DTYPE_PARAM,
+                                                                (dvoid *)&charset_form,
+                                                                (dvoid *)0,
+                                                                OCI_ATTR_CHARSET_FORM,
+                                                                statement->pError));
+                                statement->error = oci_error(statement->pError, "OCIAttrGet OCI_DTYPE_PARAM/OCI_ATTR_CHARS
ET_FORM", error);
+                                if (statement->error) {
+                                         oci_handle_error(statement->conn, statement->error);
+                                         return 0; /* XXX we loose memory!!! */
+                                }
+                       }
+
                        CALL_OCI_RETURN(error, OCIAttrGet(
                                                (dvoid *)param,
                                                OCI_DTYPE_PARAM,
@@ -1700,6 +1716,9 @@
                                                ) {
                                                outcol->storage_size4 = 512; /* XXX this should fit "most" NLS date-formats
 and Numbers */
                                        } else {
+                                               if(charset_form == SQLCS_NCHAR)
+                                                        outcol->storage_size4 *=2; /* double for unicode */
+
                                                outcol->storage_size4++; /* add one for string terminator */
                                        }
                                        if (outcol->data_type == SQLT_BIN) {

Reproduce code:
---------------
ocifetchinto()


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-02-28 23:23 UTC] tony2001@php.net
See also #31042, #27156.
Why do you multiple storage size by 2? Why not 4 (afaik max.  UTF char length is 4 bytes)? 6 (other Unicodes may contain such chars too)?
This is known solution and it's *ugly*.

 [2005-03-01 03:34 UTC] bmr at comtime dot com
Oracle seems to report double the length of the field already, i.e. as if UTF-16 encoding was used.  AFAIK Oracle only supports UTF-8 and UTF-16.  The Oracle documentation says that the byte length is 3 times the character length for UTF-8 and 2 times for UTF-16.  So I guess the right fix is to multiply by 1.5!  Does anyone know how to get Oracle to fix their stuff?  I looked at the OCI docs for quite a while and couldn't find a way to get the right length, so it does seem like their bug.
 	
Microsoft Word - nchar_migration.doc: Migration to Unicode Datatypes for Multilingual Databases and Applications $Q2UDFOH:KLWH3DSHU August 2003 Migration to... (published 12/17/2003)
http://www.oracle.com/technology/tech/globalization/pdf/TWP_NCHAR_Migration_10gR1.pdf
 [2005-09-08 11:47 UTC] tony2001@php.net
The bug has been fixed in OCI8 v.1.1, which is available in CVS HEAD and PECL (use `pear install oci8-beta` to install it).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 10:01:29 2024 UTC