php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #32140 NVARCHAR2 columns are truncated
Submitted: 2005-02-28 23:08 UTC Modified: 2005-09-08 11:47 UTC
From: bmr at comtime dot com Assigned: tony2001 (profile)
Status: Closed Package: OCI8 related
PHP Version: 5.0.4, 4.3.11 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: bmr at comtime dot com
New email:
PHP Version: OS:

 

 [2005-02-28 23:08 UTC] bmr at comtime dot com
Description:
------------
NVARCHAR2 columns can be used to store UTF-8 or UTF-16 characters.  When using UTF-8 it can take 3 (or more??) bytes to represent one character.  PHP truncates strings that have byte representations longer than 2 times the character length.    This is actually an Oracle bug because the OCI call returns the wrong byte length.

Here's my patch:
--- php-4.3.10/ext/oci8/oci8.c  Wed Nov  3 08:35:56 2004
+++ php-4.3.10.cti/ext/oci8/oci8.c      Mon Feb 28 15:37:54 2005
@@ -1443,6 +1443,7 @@
        ub4 iters;
        ub4 colcount;
        ub2 dynamic;
+       ub1 charset_form;
        int dtype;
        dvoid *buf;
        oci_descriptor *descr;
@@ -1573,6 +1574,21 @@
                                return 0; /* XXX we loose memory!!! */
                        }

+                       if(outcol->data_type == SQLT_CHR) {
+                                CALL_OCI_RETURN(error, OCIAttrGet(
+                                                                (dvoid *)param,
+                                                                OCI_DTYPE_PARAM,
+                                                                (dvoid *)&charset_form,
+                                                                (dvoid *)0,
+                                                                OCI_ATTR_CHARSET_FORM,
+                                                                statement->pError));
+                                statement->error = oci_error(statement->pError, "OCIAttrGet OCI_DTYPE_PARAM/OCI_ATTR_CHARS
ET_FORM", error);
+                                if (statement->error) {
+                                         oci_handle_error(statement->conn, statement->error);
+                                         return 0; /* XXX we loose memory!!! */
+                                }
+                       }
+
                        CALL_OCI_RETURN(error, OCIAttrGet(
                                                (dvoid *)param,
                                                OCI_DTYPE_PARAM,
@@ -1700,6 +1716,9 @@
                                                ) {
                                                outcol->storage_size4 = 512; /* XXX this should fit "most" NLS date-formats
 and Numbers */
                                        } else {
+                                               if(charset_form == SQLCS_NCHAR)
+                                                        outcol->storage_size4 *=2; /* double for unicode */
+
                                                outcol->storage_size4++; /* add one for string terminator */
                                        }
                                        if (outcol->data_type == SQLT_BIN) {

Reproduce code:
---------------
ocifetchinto()


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-02-28 23:23 UTC] tony2001@php.net
See also #31042, #27156.
Why do you multiple storage size by 2? Why not 4 (afaik max.  UTF char length is 4 bytes)? 6 (other Unicodes may contain such chars too)?
This is known solution and it's *ugly*.

 [2005-03-01 03:34 UTC] bmr at comtime dot com
Oracle seems to report double the length of the field already, i.e. as if UTF-16 encoding was used.  AFAIK Oracle only supports UTF-8 and UTF-16.  The Oracle documentation says that the byte length is 3 times the character length for UTF-8 and 2 times for UTF-16.  So I guess the right fix is to multiply by 1.5!  Does anyone know how to get Oracle to fix their stuff?  I looked at the OCI docs for quite a while and couldn't find a way to get the right length, so it does seem like their bug.
 	
Microsoft Word - nchar_migration.doc: Migration to Unicode Datatypes for Multilingual Databases and Applications $Q2UDFOH:KLWH3DSHU August 2003 Migration to... (published 12/17/2003)
http://www.oracle.com/technology/tech/globalization/pdf/TWP_NCHAR_Migration_10gR1.pdf
 [2005-09-08 11:47 UTC] tony2001@php.net
The bug has been fixed in OCI8 v.1.1, which is available in CVS HEAD and PECL (use `pear install oci8-beta` to install it).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 23:01:28 2024 UTC