Bug #31536 Confusion with XML nodes named as PHP keywords
Submitted: 2005-01-13 12:05 UTC Modified: 2005-01-14 09:08 UTC
Avg. Score:3.3 ± 0.7
Reproduced:0 of 1 (0.0%)
From: exaton at free dot fr Assigned:
Status: Wont fix Package: SimpleXML related
PHP Version: 5CVS-2005-01-13 (dev) OS: WinXP SP2
Private report: No CVE-ID: None
 [2005-01-13 12:05 UTC] exaton at free dot fr
Note : I wish I could call this "remark" as opposed to "bug", because I don't really suppose it can be solved, anyway.

The exact version of PHP used is the Jan 12 2005 18:14:35 5.0.4-dev Win32 snapshot.

When accessing object members, I am in the habit of putting spaces on either side of the object/arrow operator "->" for clarity.

It is a known situation that with such practice, the right-hand argument (the object member) had better not be named after a PHP keyword. For example :

class A {
  public $list;
  public function __construct() {
    $this -> list = 'value'; // (*)

Line (*) generates a parse error, unexpected T_LIST, expecting T_STRING or T_VARIABLE or '{' or '$'.

That is easily resolved : it would extremely bad practice to name a member thus anyway. But what of XML nodes seen as SimpleXML objects ?

<?xml version="1.0"?>
  <list>item1, item2, item3</list>

$xml = simplexml_load_string/file(/* what is above */);
foreach ($xml -> list as $elemList) {   // (**)
  // manage the list, e.g. make it into an array...

Exact same problem as before on line (**), of course, because of how SimpleXML gives access to child nodes.

The issue here is that I am not the one writing the XML file, defining its XML Schema, etc. ; so I can no longer solve the problem with better naming practice. And imposing PHP practice in PHP code is consistent : it seems less so to impose PHP practice to an otherwise independant XML file.

Easiest workaround : drop the spacing on either side (or at least, the right-hand side) of the object operator, and live up the ensuing slight ugliness.

But might not something be done at a deeper level ? I imagine this crops up for all reserved words, that the parser will see as special tokens before considering that they might be object members.
But the thing is, I cannot think of any context in which the right-hand argument of an object access might _want_ to be the PHP keyword, with the functionnality induced.
I.e. is it possible for "$obj -> list" to mean anything with regard to what "list" means for PHP ?

Same for the other reserved words as far as five minutes' pondering can make out.

As I said in introduction, I don't expect it is really feasible to clear this "bug", as in make the parser see the right-hand argument of a "->" operation as a systematic object member and never imagine that it might be a special token, but I would be interested in someone from PHP providing a quick thought on this.

Thank you.


 [2005-01-14 02:44 UTC]
I find it odd that PHP does this anyway. I don't think that you should be able to name any variable, class or outside, after a keyword (even with $). Regardless of how you use ->, C doesn't let you do this at all. It'll throw obscure parse errors like:

test.c(2) : error C2632: 'int' followed by 'double' is illegal     
test.c(2) : error C2208: 'int' : no members defined using this type
> int double;

This is simply better in the long run for the compiler, because for each occurance of .double or ->double, the compiler doesn't need to check its symbol table to know whether or not you're really stupid enough to name a struct var double -- it knows to plainly call you stupid. This saves time during a compile (much, I can imagine).

SimpleXML, I'm pretty sure, is implemented using __get() (?), but I imagine having this kind of flexibility in PHP allows for classes like SimpleXML to exist, as well as things like FFI (ffi_struct). 

I'd call this my two cents, but my checking account is overdrawn. I owe you two cents.
 [2005-01-14 02:46 UTC]
I'm sorry, little snippet I forgot to include: this should be moved to php.internals as it's not a bug...yet. I was actually wrong to reply here.
 [2005-01-14 09:08 UTC]
This will not be fixed, just don't use spaces around them. The  parser is already complex enough and this would mean quite some new rules there.
