php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #16681 Request new words() function returning no of words in string
Submitted: 2002-04-18 10:15 UTC Modified: 2003-02-06 14:40 UTC
From: phpbugs at priorwebsites dot com Assigned:
Status: Closed Package: Feature/Change Request
PHP Version: 4.1.2 OS: Windows XP Home Edition
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: phpbugs at priorwebsites dot com
New email:
PHP Version: OS:

 

 [2002-04-18 10:15 UTC] phpbugs at priorwebsites dot com
I notice that PHP does not have a words() function; I propose that one is added.

For anybody doing string processing this is an extremely useful function.  It is a commonly found function in other languages (e.g. REXX).  It is not just for general text processing that such a function is useful, but whenever dealing with strings.

For example, I had a string where I had to take different actions if there was only 1 word in a string than if there were many.  

I agree that you can do this sort of thing with regular expressions, but hey, you can do a Leveshtein function too with regular expressions ;).

I can guarantee if there was one, it would be widely used.

words(), simply should return the number of words in a string, e.g.

words("hello") -> 1

words("this is my example") -> 4

words("     ah how about this, eh   smartass?") -> 6

words("thank you PHP team for taking the time to read this and giving due consideration for this suggestion rather than just throwing it in the waste bin because you've got more urgent things to do") -> 35

Hugh Prior

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-04-18 12:08 UTC] daniel@php.net
I think in PHP it would look like this:

function word_count($string) {
  return count(preg_split("/\s+/", $string));
}
 [2002-04-19 02:09 UTC] phpbugs at priorwebsites dot com
I would strongly favour the name words() rather than word_count().  Though there might be arguments for consistency with other functions to call it word_count(), I think that words() is the more natural name and more easily rememebered.

if (words($mystring) > 1) do_something();

You talk abouts "words" in a string and not "word count" in a string.

Hugh
 [2002-04-19 03:56 UTC] derick@php.net
In my opinion this function doesn't that much value to PHP.
 [2002-04-19 04:22 UTC] phpbugs at priorwebsites dot com
I wish to expand this request to suggest full support for word processing in strings.  This is something which is extremely useful (it appears for example in REXX) when processing strings.

word (str, num) 
Returns the num'th word from string str. If num is greater than the number of words in str, the null string is returned. 
 
wordindex (str, num) 
Returns the character position of the first letter of the num'th word in string str, or 0 if num is greater than the number of characters in str. 
 
wordlength (str, num) 
Returns the length of the num'th word in string str, or 0 if num is greater than the number of words in str. 
 
words (str, num)
Returns the number of words in the string str.
 [2002-04-19 04:28 UTC] phpbugs at priorwebsites dot com
Derek,

This function is EXTREMELY useful.  The words() function is to strings what the sizeof() function is to arrays!  I cannot imagine living without the sizeof() function when dealing with arrays; if I were dealing with strings a lot I cannot imagine living without a words() function.

Here's some example code which shows how elegant it can make string use:  

/**************************/
$countries = "England Spain France Italy";

print "There are " . words($countries) . " countries competing";

print "The " . words($countries) . " are:";

for ($i=0; $i<=words($countries); $i++) {
  print word($countries, $i);
}
/***************************/

Of course, there are many other uses, but this I hope gives a flavour for how handy and elegant such functionality is.
 [2002-04-19 04:32 UTC] phpbugs at priorwebsites dot com
When I was coding the sample code with the $countries (above), I realised that PHP also misses some other crucial functions for word processing.  These are basic word functions which I am used to having in a powerful language.

Therefore, request changed to request words(), word(), wordindex(), wordlength() functions.

Hugh Prior
 [2002-04-19 04:59 UTC] daniel@php.net
This does not warrant new functions. What you are trying to do should be done with arrays:

/**************************/
$countries = array("England", "Spain", "France", "Italy");

print "There are " . count($countries) . " countries competing";

print "The " . count($countries) . " are:";

for ($i=0; $i<count($countries); $i++) {
  print $countries[$i];
}
/***************************/

if you have an input string like:

  $string = "England Spain France Italy";

you can make an array of it:

  $countries = preg_split("/\s+/", $string);

Still, the proposed functions look like a good addition to the language as people do not want to study regular expressions.
 [2002-04-20 08:45 UTC] phpbugs at priorwebsites dot com
I notice this example in the ereg section:

ereg ("([[:alnum:]]+) ([[:alnum:]]+) ([[:alnum:]]+)", $string,$regs); 
/* Places three space separated words
   into $regs[1], $regs[2] and $regs[3]. */

which could become something like the following with the proposed word() function:
$regs[1] = word($string, 1);
$regs[2] = word($string, 2);
$regs[3] = word($string, 3);

Even for hardened regular expression fans, I'm sure you would agree which would be nicer to encounter in somebody elses uncommented code.

And I know which I'd rather place my money on as working correctly.

Hugh
 [2002-05-06 10:16 UTC] phpbugs at priorwebsites dot com
I have a few points to add:

1) YOU CANNOT ALWAYS CHOOSE THE STARTING FORMAT

Daniel says that instead of:
  $countries = "England Spain France Italy";
we should code:
  $countries = array("England", "Spain", "France", "Italy");
but it is not always so easy to start with an array.  Let me give a real world example.

I am currently writing a search engine, where the user may type whatever they want; I want to match not just the whole search term, but each individual word.  So, if the user types "Baptist churches", I can match "Baptist" and "churches".

// $search_string = "Baptist churches"

// Loop for the number of words in the string
for ($i=0; $i<=words($search_string); $i++) {
  // Get the matches for the i-th word
  get_matches(word($search_string, $i));
}

I agree that we CAN get the words using regular expressions.  That brings me to my next point.

2) IT IS NOT EASY FOR PEOPLE TO CORRECTLY USE REGULAR EXPRESSIONS TO GET WORDS.

If you are reading this, you are probably a highly skilled PHP programmer, able to create first time a correctly functioning regular expression.  And one which functions correctly for ALL inputs.

This example in the ereg section from the PHP documentation itself does NOT always give the correct result:

ereg ("([[:alnum:]]+) ([[:alnum:]]+) ([[:alnum:]]+)", $string,$regs); 
/* Places three space separated words
   into $regs[1], $regs[2] and $regs[3]. */

For example, in my real search engine example, a user searching "C++ or C" would be disappointed to find that I have searched for "C" or "C"!  

And of course regular expressions do not use the locale.

3) THE OBJECT-ORIENTED VIEW OF STRINGS DEMANDS THE PROPOSED WORD FUNCTIONS

It goes against OO to convert a string into an array, just because string lacks a function (i.e. words) that array has (i.e. count).  If we view the string as an object, it is not satisfactory to say that we can convert the object to another object type and use the functions which exist for that object type.  It is rather like saying that we do not need an addition operator for integers because we are able to convert integers to doubles/floats in order to add.

4) CONSISTENCY

There already exists the 'ucwords()' function, so it is accepted that a word is a valid concept within a string.

Hugh
 [2003-02-06 14:40 UTC] iliaa@php.net
This bug has been fixed in CVS.

In case this was a PHP problem, snapshots of the sources are packaged
every three hours; this change will be in the next snapshot. You can
grab the snapshot at http://snaps.php.net/.
 
In case this was a documentation problem, the fix will show up soon at
http://www.php.net/manual/.

In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites in short time.
 
Thank you for the report, and for helping us make PHP better.

str_words_count() is avaliable as of PHP 4.3.0.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 11:01:30 2024 UTC