php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64740 Gender ignores country for some names.
Submitted: 2013-04-30 08:37 UTC Modified: 2013-06-02 07:26 UTC
From: RQuadling at GMail dot com Assigned: ab (profile)
Status: Closed Package: PECL (PECL)
PHP Version: Irrelevant OS: Centos
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: RQuadling at GMail dot com
New email:
PHP Version: OS:

 

 [2013-04-30 08:37 UTC] RQuadling at GMail dot com
Description:
------------
The Gender extension ignores the country when the requested name is male or female 
(trying not to mention the word uni- es ee ex as I think this is causing SPAM 
alert when posting bugs) in a country where the name is known, even though it is 
male in the requested country.

Also had to post this as a PECL bug as there is no PECL/Gender entry to choose.

Test script:
---------------
<?php
$o_Gender = new Gender\Gender;
$o_Gender->trace();
var_dump($o_Gender->get('Ben', Gender\Gender::BRITAIN));
// var_dump($o_Gender->get('Dan', Gender\Gender::BRITAIN));
// var_dump($o_Gender->get('Richard', Gender\Gender::BRITAIN));


Expected result:
----------------
Searching for name 'Ben'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 1 - 6110,  guess = 3055 ('Aranita')
Range = line 3056 - 6110,  guess = 4583 ('Barak')
Range = line 4584 - 6110,  guess = 5347 ('Beybala')
Range = line 4584 - 5346,  guess = 4965 ('Benet')
Range = line 4584 - 4964,  guess = 4774 ('Bavani')
Range = line 4775 - 4964,  guess = 4869 ('Behudin')
Range = line 4870 - 4964,  guess = 4917 ('Belk<i>s')
Range = line 4918 - 4964,  guess = 4941 ('Bendina')
Range = line 4918 - 4940,  guess = 4929 ('Belu<sch>e')
Range = line 4930 - 4940,  guess = 4935 ('Benan')
Range = line 4930 - 4934,  guess = 4932 ('Belva')
Range = line 4933 - 4934,  guess = 4933 ('Ben')
Result: name 'Ben' found
evaluating name 'Ben':  'is male'  (country = Great Britain[3] or Ireland[1] or 
U.S.A.[3] or Belgium[4] or the Netherlands[7])
evaluating name 'Ben':  'is uni*** name'  (country = China[3])
result for 'Ben':  'is male name'
int(77)

Searching for name 'Dan'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 6112 - 12222,  guess = 9167 ('Delfa')
Range = line 6112 - 9166,  guess = 7639 ('Chrysostomia')
Range = line 7640 - 9166,  guess = 8403 ('Curzio')
Range = line 8404 - 9166,  guess = 8785 ('Danu?e')
Range = line 8404 - 8784,  guess = 8594 ('Dalbir')
Range = line 8595 - 8784,  guess = 8689 ('Dan Daniel')
Range = line 8595 - 8688,  guess = 8641 ('Dalva')
Range = line 8642 - 8688,  guess = 8665 ('Damion')
Range = line 8666 - 8688,  guess = 8677 ('Dan')
Range = line 8666 - 8677,  guess = 8671 ('Damnjan')
Range = line 8672 - 8677,  guess = 8674 ('Damyan')
Range = line 8675 - 8677,  guess = 8676 ('Dan')
Range = line 8675 - 8676,  guess = 8675 ('Damyanti')
Range = line 8676 - 8676,  guess = 8676 ('Dan')
Result: name 'Dan' found
evaluating name 'Dan':  'is male'  (country = Great Britain[2] or Ireland[3] or 
U.S.A.[4] or Belgium[1] or Luxembourg[4] or the Netherlands[1] or Swiss[1] or 
Denmark[6] or Norway[2] or Sweden[6] or Finland[3] or Romania[8] or Moldova[6] 
or Israel[7])
evaluating name 'Dan':  'is mostly male'  (country = Vietnam[6])
evaluating name 'Dan':  'is uni*** name'  (country = China[7])
result for 'Dan':  'is male name'
int(77)

Actual result:
--------------
Searching for name 'Ben'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 1 - 6110,  guess = 3055 ('Aranita')
Range = line 3056 - 6110,  guess = 4583 ('Barak')
Range = line 4584 - 6110,  guess = 5347 ('Beybala')
Range = line 4584 - 5346,  guess = 4965 ('Benet')
Range = line 4584 - 4964,  guess = 4774 ('Bavani')
Range = line 4775 - 4964,  guess = 4869 ('Behudin')
Range = line 4870 - 4964,  guess = 4917 ('Belk<i>s')
Range = line 4918 - 4964,  guess = 4941 ('Bendina')
Range = line 4918 - 4940,  guess = 4929 ('Belu<sch>e')
Range = line 4930 - 4940,  guess = 4935 ('Benan')
Range = line 4930 - 4934,  guess = 4932 ('Belva')
Range = line 4933 - 4934,  guess = 4933 ('Ben')
Result: name 'Ben' found
evaluating name 'Ben':  'is male'  (country = Great Britain[3] or Ireland[1] or 
U.S.A.[3] or Belgium[4] or the Netherlands[7])
evaluating name 'Ben':  'is uni*** name'  (country = China[3])
result for 'Ben':  'is uni*** name'
int(63)

Searching for name 'Dan'  (country = Great Britain)
Range = line 1 - 48891,  guess = 24446 ('Kyung+Ju')
Range = line 1 - 24445,  guess = 12223 ('Esben')
Range = line 1 - 12222,  guess = 6111 ('Brendon')
Range = line 6112 - 12222,  guess = 9167 ('Delfa')
Range = line 6112 - 9166,  guess = 7639 ('Chrysostomia')
Range = line 7640 - 9166,  guess = 8403 ('Curzio')
Range = line 8404 - 9166,  guess = 8785 ('Danu?e')
Range = line 8404 - 8784,  guess = 8594 ('Dalbir')
Range = line 8595 - 8784,  guess = 8689 ('Dan Daniel')
Range = line 8595 - 8688,  guess = 8641 ('Dalva')
Range = line 8642 - 8688,  guess = 8665 ('Damion')
Range = line 8666 - 8688,  guess = 8677 ('Dan')
Range = line 8666 - 8677,  guess = 8671 ('Damnjan')
Range = line 8672 - 8677,  guess = 8674 ('Damyan')
Range = line 8675 - 8677,  guess = 8676 ('Dan')
Range = line 8675 - 8676,  guess = 8675 ('Damyanti')
Range = line 8676 - 8676,  guess = 8676 ('Dan')
Result: name 'Dan' found
evaluating name 'Dan':  'is male'  (country = Great Britain[2] or Ireland[3] or 
U.S.A.[4] or Belgium[1] or Luxembourg[4] or the Netherlands[1] or Swiss[1] or 
Denmark[6] or Norway[2] or Sweden[6] or Finland[3] or Romania[8] or Moldova[6] 
or Israel[7])
evaluating name 'Dan':  'is mostly male'  (country = Vietnam[6])
evaluating name 'Dan':  'is uni*** name'  (country = China[7])
result for 'Dan':  'is uni*** name'
int(63)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-04-30 09:22 UTC] ab@php.net
-Assigned To: +Assigned To: ab
 [2013-05-18 17:03 UTC] ab@php.net
-Status: Assigned +Status: Wont fix
 [2013-05-18 17:03 UTC] ab@php.net
Richard,

after some research I come to the conclusion this being not a bug.

Strictly speaking, both Dan and Ben aren't names but nicknames, respectively for Daniel and 
Benjamin. Looking into the data there are two corresponding lines

=  Ben Benjamin               1 1
=  Dan Daniel                 111           1

That means in both cases it is male in Britain. However because the exact input was the 
nickname, not the real name, the library looks and evaluates the literally given input and 
compares the frequencies in all the other countries.

I think a change in this area could break the data integrity for the normal operations. Where  
Ben can evaluate to Benjamin, it could also to Benedict. I think more like about adding some 
new method like Gender::getRealName($nickname) or adding some options to the existing get 
method. Changing that behaviour globally might break other more complicated cases.

Any ideas?

Thanks )
 [2013-05-18 17:22 UTC] ab@php.net
To be more clear, the way i see it should be solved is

$g = new Gender\Gender;
if($g->isNick($name)) {
    $name = $g->getNameForNick($name); // to impement
}
$gender = $g->get($name);

That could work as long as there is an unambiguous correlation between the name 
and the nick. What would you say?
 [2013-05-18 20:54 UTC] RQuadling at GMail dot com
The issue for me that even if the name is female in one country, it is a single 
country that I'm asking for. Dan, in England, _IS_ a male name. And the data does 
show this.

As things stand, supplying the country is redundant.
 [2013-05-18 21:45 UTC] ab@php.net
-Status: Wont fix +Status: Analyzed
 [2013-05-18 21:45 UTC] ab@php.net
But that means that were the full name, what I mean it's a nickname, though I'm 
not a native speaker :) Looking at here

http://en.wikipedia.org/wiki/Ben
http://en.wikipedia.org/wiki/Daniel_(name)

any person I click evaluates to one of Benjamin or Daniel. Is it so, that the 
short variant can be really given as a full name, like in the birth certificate?

There are two terms in the ext - nick name and full name. The difference is that 
if you type literally 'Dan', the library thinks that you're looking for the full 
name Dan, not for Daniel. As Dan is recorded as a term of endearment for Daniel. 
The further handling of course not correct, as for me it should say 'not found' 
and not try to evaluate all the countries.

Why I ask is that the quality of the data is important, if you as a native 
speaker are sure the shorter variant can be a full name, I should correct the 
data. Otherwise I'd prefer to implement it the way i've mentioned, checking if 
that's a nick name, getting the full name and working with it instead. Because 
there are cases where the same nick name can point to either male or female full 
name, changing the library behavior in this case were wrong.

Thanks.
 [2013-05-19 17:18 UTC] RQuadling at GMail dot com
But even if the name is a nickname, it is still a MALE nickname in England. The 
country option isn't doing it's job.
 [2013-05-19 20:39 UTC] ab@php.net
Automatic comment from SVN on behalf of ab
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=330297
Log: Fixed bug #64740 Gender ignores country for some names.
 [2013-05-19 20:42 UTC] ab@php.net
Richard,

could you please check the current trunk. I made a change to the library which 
should allow the lookup in a single country only. Tested also with the names 
like 

Jan, Val, Chis, Brynn, Carol, Fran, Ronny, Gene, Rene

If you have some more ambiguous names in mind, please test them too.

Thanks.
 [2013-05-19 20:43 UTC] ab@php.net
Chris i mean, not Chis :)
 [2013-06-02 07:00 UTC] ab@php.net
Automatic comment from SVN on behalf of ab
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=330437
Log: added tests for bug #64740
 [2013-06-02 07:26 UTC] ab@php.net
-Status: Analyzed +Status: Closed
 [2013-06-02 07:26 UTC] ab@php.net
Thank you for your bug report. This issue has already been fixed
in the latest released version of PHP, which you can download at 
http://www.php.net/downloads.php

fixed in 1.0.0
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 11:01:27 2024 UTC