php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41784 casting simplexmlelement to string returns empty
Submitted: 2007-06-23 00:35 UTC Modified: 2007-06-23 08:55 UTC
From: mistknight at gmail dot com Assigned:
Status: Not a bug Package: SimpleXML related
PHP Version: 5.2.3 OS: kubuntu feisty 7.04
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: mistknight at gmail dot com
New email:
PHP Version: OS:

 

 [2007-06-23 00:35 UTC] mistknight at gmail dot com
Description:
------------
First of all I admit this was on PHP 5.2.1 and I have my PHP installed from repository, not compiled. It would be a pain to install PHP 5.2.3

Simply retrieve this ATOM 1.0 file and load it into a simpleXML object

http://www.tbray.org/ongoing/ongoing.atom

Now try to get the <entry>'s nested <content> or <summary> tags as a string. If you cast your object like so: (string)$obj->content or (string)$obj->summary you will get an empty string. 

Is there ANY workaround for this that will let me retrieve the string? I tried the docs but there's nothing about this issue. They all say simply cast to string and I've done that on other simplexmlelement objects without problems, even elements which had nested tags. Thanks!

Expected result:
----------------
The string located inside the <content></content> or <summary></summary> tags.

Actual result:
--------------
Empty string.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-06-23 08:55 UTC] rasmus@php.net
That's because SimpleXML doesn't understand Atom's way of mixing content.  The first summary node, for example, looks like this:

<summary type='xhtml'><div xmlns='http://www.w3.org/1999/xhtml'>Last Monday, we spent some time at the <a href='http://www.u-tokyo.ac.jp/index_e.html'>University of Tokyo</a>, where we talked about Ruby and so on; quite a change of pace from the rest of the visit.</div></summary>

That means that there isn't actually any string value in the summary node.  There is however a div node there and that div node has string contents.  Normally you would either encode the HTML or use a CDATA block there which is probably what you had when you refer to it working before "with nested tags".  SimpleXML has no idea that <div> is an HTML element and not an XML element.  How would it?

So you can get the string like this:

(string)$x->entry->summary->div

But note that inside that div element, you have an <a href> element.  The easy way to see the structure is to just do this:

print_r($x->entry->summary);

Which gives you this:

SimpleXMLElement Object
(
    [@attributes] => Array
        (
            [type] => xhtml
        )

    [div] => Last Monday, we spent some time at the , where we talked about Ruby and so on; quite a change of pace from the rest of the visit.
)

And print_r($x->entry->summary->div); which gives you:

SimpleXMLElement Object
(
    [a] => University of Tokyo
)

You need to apply some Atom-specific logic in order to use SimpleXML to parse an Atom feed.  One trick would be to do:

echo $x->entry->summary->asXML();

which gives you:

<summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Last Monday, we spent some time at the <a href="http://www.u-tokyo.ac.jp/index_e.html">University of Tokyo</a>, where we talked about Ruby and so on; quite a change of pace from the rest of the visit.</div></summary>

You may not want the summary element itself.  You could avoid it by doing something like:

foreach($x->entry->summary->children() as $child) echo $child->asXML();

But basically there is absolutely nothing wrong with SimpleXML here.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed May 22 06:01:34 2024 UTC