php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #4556 get_meta_tags - ( Fix supplied in this bug report )
Submitted: 2000-05-23 01:00 UTC Modified: 2001-02-10 21:40 UTC
From: c dot just at phoenixdigital dot com Assigned:
Status: Closed Package: Strings related
PHP Version: 4.0 Latest CVS (23/05/2000) OS: ALL ( Bug in source code )
Private report: No CVE-ID: None
 [2000-05-23 01:00 UTC] c dot just at phoenixdigital dot com
Since looking at the PHP4 distribution it seems to contain the same code so it would be affected by this problem as well.

I have previously submitted this bug in the PHP3 buglist this fix will apply there as well.
The PHP3 bug number is ???????. I just went to find it and it has been deleted?
Please let me know if you feel this should not be fixed?

The problem arises when a metatag definition takes up more than 1 line.

Current function works for this definition
<META NAME="DC.Language" SCHEME="freetext" CONTENT="english">

Current function fails for this definition
<META NAME="DC.Coverage.PlaceName" SCHEME="TGN" 
CONTENT="Oceania ? Australia ? Queensland ? Brisbane" > 

I am not yet adept in coding in C so I have written the fix function in PHP :)
Please feel free to ask me any questions regarding how it works.
-------------------------------------------------------------------------------
/************************************************
* GET_META_TAGS2
*************************************************
*
* IT IS A FIX FOR THE CURRENT VERSION OF THIS FUNCTION WITHIN PHP.
*
* THIS FUNCTION TAKES INTO ACCOUNT METATAGS WHICH SPAN MORE THAN 1 LINE.
*
* IT WILL EVEN ALLOW \r\n's WITHIN A name OR content DEFINITION.
*
*
*
*
*************************************************/
 
function get_meta_tags2($file){

  # OPEN FILE
  $fp = fopen($file,"r");

  # INITALISE ALL FLAGS
  #
  # THE PURPOSE OF EACH OF THEE FLAGS IS TO TELL THE SCRIPT IF THE CURRENT $buf VALUE IS
  # A CONTINUATION OF A PREVIOUS metatag, name OR content VALUE.
  # IF THE FLAG IS ACTIVE IT MEANS THAT THIS NEW LINE CONTAINS MORE INFORMATION ABOUT THE
  # metatag, name OR content.
  # THE FLAGS ARE ONLY KEPT RAISED IF AN ITEM SPANS MORE THAN 1 LINE.
  #
  $inmeta = 0;
  $inname = 0;
  $incontent = 0;
  
  # LOOP THRU FILE
  while( ($buf = fgets($fp,8191)) && (!eregi("</head>",$buf,$results)) ){

    # CHECK FOR START OF METATAG
    if( ($tmp = stristr($buf,"<meta")) || $inmeta){

      # IF START OF METATAG DETECTED MAKE $buf = $tmp OTHERWISE $buf CONTAINS CONTINUING TEXT FOR A METATAG.
      if (!$inmeta)
        $buf = $tmp;

      # RAISE INSIDE METATAG FLAG
      $inmeta = 1;

      # TRY TO RETRIEVE THE BEGINNING OF NAME 
      $tmp = stristr($buf,"name=\"");
      
      # CHECK FOR BEGINNING OF NAME OR ARE WE ALREADY INSIDE NAME SECTION
      if($tmp || $inname){

        # IS THIS LINE A CONTINUATION OF THE NAME FROM THE PREVIOUS LINE
        if($inname){
          $tmp = $buf; # RESET $tmp TO $buf AS stristr() FUNCTION RETURNED A BLANK

        # THIS IS THE BEGINNING OF THE NAME RETRIEVAL
        }else{
          $inname = 1;      # RAISE $inname FLAG
          $metaname = "";   # CLEAR METANAME VAR
          $tmp = substr($tmp,6); # MOVE POINTER TO AFTER THE name="
        }

        # FIND THE CLOSE QUOTE FOR THE NAME FIELD
        if($end = strpos($tmp,"\"")){
          $metaname .= substr($tmp,0,$end); # RETRIEVE NAME PART FROM $tmp AND APPEND TO PREVIOUS RETRIEVED VALUES.
          $metaname = strtolower(ereg_replace("\.|\\|+|\*|\?|\[|\]|\^|\(|\)|\\$","_",trim($metaname))); # REMOVE SPECIAL CHARACTERS
          $inname = 0;  # LOWER $inname FLAG 
        }else{
          $metaname .= $tmp; # NOT THE END OF THE NAME APPEND TO PREVIOUS VALUES AND PROCESS NEXT LINE.
        }

      }
      
      $tmp = stristr($buf,"content=\"");
      if($tmp || $incontent){

        # IS THIS LINE A CONTINUATION OF THE CONTENT FROM THE PREVIOUS LINE
        if($incontent){
          $tmp = $buf; # RESET $tmp TO $buf AS stristr() FUNCTION RETURNED A BLANK

        # THIS IS THE BEGINNING OF THE CONTENT RETRIEVAL
        }else{
          $incontent = 1;       # RAISE $incontent FLAG
          $metacontent = "";    # CLEAR METACONTENT VAR 
          $tmp = substr($tmp,9);  # MOVE POINTER TO AFTER THE content="
        }

        # FIND THE CLOSE QUOTE FOR THE CONTENT FIELD
        if($end = strpos($tmp,"\"")){
          $metacontent .= substr($tmp,0,$end);  # RETRIEVE CONTENT PART FROM $tmp AND APPEND TO PREVIOUS RETRIEVED VALUES.
          $incontent = 0; # LOWER CONTENT FLAG
        }else{
          $metacontent .= $tmp; # NOT THE END OF THE CONTENT APPEND TO PREVIOUS VALUES AND PROCESS NEXT LINE.
        }
                  
      }

      # CHECK FOR THE END OF THE METATAG DEFINITION
      if( ($tmp = stristr($buf,">")) && !$inname && !$incontent){
        $metatags[$metaname] = $metacontent;
        $inmeta = 0; # LOWER METATAG FLAG
      }
    }
    
  }
  
  fclose($fp);
  
  # RETURN METATAGS ARRAY TO USER.
  return($metatags);
  
}

---------------------------------------------------------------

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2000-07-30 00:09 UTC] zak@php.net
Confirmed bug using win2k, apache 1.3.12, php 4.0.1rc2
This should be a trivial fix :)
 [2000-08-17 03:03 UTC] sniper@php.net
Just tried get_meta_tags() with latest-CVS. And it didn't work at all..

--Jani
 [2000-08-17 04:54 UTC] stas@php.net
What didn't work at all? Could you please give a reproducing example?
 [2000-08-17 11:06 UTC] sniper@php.net
Oops, forgot to add that example:

<?php
$metas = get_meta_tags("http://kolumbus.fi",1);
print_r($metas);
?>

And this prints out:

array(0) { } 

And there are metas in that page..

--Jani
 [2000-08-17 11:12 UTC] stas@php.net
That page doesn't have metatags. It has META HTTP-EQUIV's, which are entirely different matter, and, in fact, nobody needs them. But get_meta_tags should be rewritten anyway (just like url_scanner was), because it doesn't get too much cases.
 [2000-08-17 16:22 UTC] sniper@php.net
Damn. Here's one which definately has metatags:

<?php
$metas = get_meta_tags("http://www.zend.com",1);
print_r($metas);
?>

And it doesn't work.

--Jani
 [2001-02-10 21:40 UTC] elixer@php.net
Fixed in CVS, please give it a whirl.

Sean
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 08 23:01:27 2024 UTC