php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #42593 PHP's url_rewriter url encoding and html escaping behavior seems wrong
Submitted: 2007-09-07 18:14 UTC Modified: 2011-04-08 21:34 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: mickoz at parodius dot com Assigned:
Status: Open Package: URL related
PHP Version: 5.2.4 OS: All (PHP Behavior)
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
25 + 36 = ?
Subscribe to this entry?

 
 [2007-09-07 18:14 UTC] mickoz at parodius dot com
Description:
------------
When using PHP's url_rewriter (with output_add_rewrite_var) I have noticed it does this:
- It will urlencode() the value when rewriting an URL
- It will not urlencode() the name when rewriting an URL
- If we set it to write to form:
  - It will create input hidden field after the form tag.
  - It will urlencode() the value when writing the value in a form input tag
  - It will not urlencode() the name when writing the name in a form input tag
- On top of that, while writing this, I found it won't htmlspecialchars() what it write (in no case) // which is probably why we need to change the arg_separator.output like this "ini_set('arg_separator.output', '&');"

Someone using a name, value pair that don't contain any html or url special characters won't notice anything.  However, if we use special characters in any of these, that can cause some problems.

For now it looks like we should avoid special character in name or value and maybe there is reason for this behavior beyond what I am thinking right now.  But I believe it should be fixed right.

Also unfortunately doing an htmlspecialchars "smart" encoding might get problematic, since a lot of people probably use the "arg_separator.output = &" hack for XHTML compliance... it won't be a smart idea to automaticelly make it looks like & (after html escaping it). (if my ideas get implemented, then there will need to be a way to guide this behavior while staying backward compatible...)

----------------------------------------
Exemple #1:
----------------------------------------

<?php
ini_set('url_rewriter.tags', 'a=href,area=href,frame=src,input=src,form=action,fieldset=');
output_add_rewrite_var("test/name", "test/value");
?>
<a href="/test2">linkname</a>
<form action="/test" method="get">
<input type="text" value="" />
<input type="submit" value="ok" />
</form>

----------------------------------------
will output this:
----------------------------------------

<a href="/test2?test/name=test%2Fvalue">linkname</a>
<form action="/test?test/name=test%2Fvalue" method="get"><input type="hidden" name="test/name" value="test%2Fvalue" />
<input type="text" value="" />
<input type="submit" value="ok" />
</form>

----------------------------------------
while I would expect:
----------------------------------------

<a href="/test2?test%2Fname=test%2Fvalue">linkname</a>
<form action="/test?test%2Fname=test%2Fvalue" method="get"><input type="hidden" name="test/name" value="test/value" />
<input type="text" value="" />
<input type="submit" value="ok" />
</form>

----------------------------------------
Exemple #2: (with html special characters)
----------------------------------------

<?php
ini_set('url_rewriter.tags', 'a=href,area=href,frame=src,input=src,form=action,fieldset=');
output_add_rewrite_var("test/name\"&", "test/value\"&");
?>
<a href="/test2">linkname</a>
<form action="/test" method="get">
<input type="text" value="" />
<input type="submit" value="ok" />
</form>

----------------------------------------
will output this:
----------------------------------------

<a href="/test2?test/name"&=test%2Fvalue%22%26">linkname</a>
<form action="/test?test/name"&=test%2Fvalue%22%26" method="get"><input type="hidden" name="test/name"&" value="test%2Fvalue%22%26" />
<input type="text" value="" />
<input type="submit" value="ok" />
</form>

----------------------------------------
while I would expect:
----------------------------------------

<a href="/test2?test/name%22%26=test%2Fvalue%22%26">linkname</a>
<form action="/test?test/name%22%26=test%2Fvalue%22%26" method="get"><input type="hidden" name="test/name&quot;&amp;" value="test/value&quot;&amp;" />
<input type="text" value="" />
<input type="submit" value="ok" />
</form>

========================================
Resume of the behavior I would expect (written in notes+pseudo/php code)
========================================

- When rewriting an URL, append to the original url (that one should have been espaced properly by the user):
  - "?" if not defined (already doing this)
  - htmlspecialchars(urlencode($name1).'='.urlencode($value1).'&'.urlencode($name2).'='.urlencode($value2)).'&'...)
    where '&' is the argument separator (and if doing so, we won't need to put html in an arg separator value!!! which at first did not make sense, even thought there is maybe reason, especially now that people rely on it...).
    - urlencode the name and the value
    - html escape what we output (since they are always in html value zone)

- When adding input type=hidden fields
  - <input type="hidden" name="<?php htmlspecialchars($name1); ?>" value="<?php htmlspecialchars($value1); ?>" />
  - no url escaping, should not be done here!
    - html escape the name and value as it should be

// there is maybe details that I forgot, but that is a good start and I can always help think further (and validate the logic) if someone handle this case

----------------------------------------
Notes:
----------------------------------------

- The problem I got when submitting this bug is that I was using urlrewriter to write a value like "N/A".  It did add input hidden field in my forms and one of my form has method="get".  When submitting a form in method=get, it does escape the form's input value... and since it was already escaped... the value was escaped twice (which make it impossible to use that kind of value in that context, unless we add hack around it...)
  example:
  output_add_rewrite_var("var", "N/A");
  become:
  <input type="hidden" name="var" value="N%2FA" />
  then become in URL after submitting the form:
  N%252FA
  (because % get escaped to %25)

- A solution is to never use special characters for name and value when using php urlrewriter, but still it doesn't feel logical (unless it can apply to other usage that I am not aware of)

- Another solution might be to write my own output handler

- I believe these kind of thing is at the origin of other related problem (or "hack" to fix it in PHP) and that fixing it there will avoid need like needing to change the arg separator with html in it (again there is probably stuff I don't know which render this impossible, but maybe not)

Hope that help and that will improve thing (or that I will simply understand why all these behaviors)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-04-08 21:34 UTC] jani@php.net
-Package: Feature/Change Request +Package: URL related
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sun Jun 16 06:01:30 2019 UTC