|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #37513 Have news submission system perform HTML sanity checks, prevent RSS breakage
Submitted: 2006-05-18 21:07 UTC Modified: 2010-12-20 12:12 UTC
From: edwardzyang at thewritingpot dot com Assigned: bjori (profile)
Status: Closed Package: Website problem
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
 [2006-05-18 21:07 UTC] edwardzyang at thewritingpot dot com
The news items on often are sloppily written, with improper encodings and unencoded ampersands. Perhaps you guys should get HTMLTidy to give HTML code a sanity check before allowing it to be submitted. This is even more important for RSS, because one stray ampersand can break the whole feed.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-05-18 21:11 UTC]
We are open for patches.
 [2006-05-18 21:14 UTC] edwardzyang at thewritingpot dot com
I'm investigating the source. However, it seems like the news items are inlined inside index.php. Factoring them out would present major architectural changes.
 [2006-10-05 23:18 UTC] edwardzyang at thewritingpot dot com
What file is the RSS parser located in? index.php mentions it but doesn't give any other indication.
 [2006-11-06 23:11 UTC] edwardzyang at thewritingpot dot com
The real parser is

I need to know a little more about server setup before I write a patch. I could use Tidy (very quick and easy fix), but it may not be installed on's servers, and thus not be a viable solution. If that is the case, we'll probably have to bundle in a library like HTML Purifier to do the dirty work. If that seems like overkill, a quick:

$text = html_entity_decode($text, ENT_COMPAT, 'UTF-8');
$text = htmlspecialchars($text, ENT_COMPAT, 'UTF-8');

...would work on PHP 5, but since that's not deployed, you'd be better off replacing the html_entity_decode() with:

$text = str_replace(array('&amp;', '&quot;', '&lt;', '&gt;'), array('&', '"', '<', '>'), $text);

...and hope no one uses a non-special entity. If you want that covered too use HTML Purifier.
 [2007-06-07 18:12 UTC]
 [2010-12-20 12:12 UTC]
-Package: Tidy +Package: Website problem
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 15:01:29 2024 UTC