php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37513 Have news submission system perform HTML sanity checks, prevent RSS breakage
Submitted: 2006-05-18 21:07 UTC Modified: 2010-12-20 12:12 UTC
From: edwardzyang at thewritingpot dot com Assigned: bjori (profile)
Status: Closed Package: Website problem
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: edwardzyang at thewritingpot dot com
New email:
PHP Version: OS:

 

 [2006-05-18 21:07 UTC] edwardzyang at thewritingpot dot com
Description:
------------
The news items on PHP.net often are sloppily written, with improper encodings and unencoded ampersands. Perhaps you guys should get HTMLTidy to give HTML code a sanity check before allowing it to be submitted. This is even more important for RSS, because one stray ampersand can break the whole feed.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-05-18 21:11 UTC] goba@php.net
We are open for patches.
 [2006-05-18 21:14 UTC] edwardzyang at thewritingpot dot com
I'm investigating the source. However, it seems like the news items are inlined inside index.php. Factoring them out would present major architectural changes.
 [2006-10-05 23:18 UTC] edwardzyang at thewritingpot dot com
What file is the RSS parser located in? index.php mentions it but doesn't give any other indication.
 [2006-11-06 23:11 UTC] edwardzyang at thewritingpot dot com
The real parser is http://cvs.php.net/viewvc.cgi/php-master-web/scripts/rss_parser?view=markup

I need to know a little more about server setup before I write a patch. I could use Tidy (very quick and easy fix), but it may not be installed on PHP.net's servers, and thus not be a viable solution. If that is the case, we'll probably have to bundle in a library like HTML Purifier http://hp.jpsband.org/ to do the dirty work. If that seems like overkill, a quick:

$text = html_entity_decode($text, ENT_COMPAT, 'UTF-8');
$text = htmlspecialchars($text, ENT_COMPAT, 'UTF-8');

...would work on PHP 5, but since that's not deployed, you'd be better off replacing the html_entity_decode() with:

$text = str_replace(array('&amp;', '&quot;', '&lt;', '&gt;'), array('&', '"', '<', '>'), $text);

...and hope no one uses a non-special entity. If you want that covered too use HTML Purifier.
 [2007-06-07 18:12 UTC] bjori@php.net
See http://php.net/feed.atom
 [2010-12-20 12:12 UTC] jani@php.net
-Package: Tidy +Package: Website problem
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 02:01:28 2024 UTC