php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #58676 XHProf does not compile under Leopard
Submitted: 2009-05-17 19:22 UTC Modified: 2009-06-04 18:10 UTC
From: s dot spasov at gmail dot com Assigned: kannan (profile)
Status: Closed Package: xhprof (PECL)
PHP Version: 5.2.9 OS: MAC OS X Leopard 10.5.7
Private report: No CVE-ID: None
 [2009-05-17 19:22 UTC] s dot spasov at gmail dot com
Description:
------------
Error compiling XHProf under Leopard 10.5.7 because of lack 
of Apple's Affinity API: http://developer.apple.com/releasenotes/Performance/RN-
AffinityAPI/

/Users/svilen/Build/php-5.2.9/ext/xhprof/xhprof.c:202: 
error: 
syntax error before 'cpu_set_t'
/Users/svilen/Build/php-5.2.9/ext/xhprof/xhprof.c:202: 
warning: no semicolon at end of struct or union
/Users/svilen/Build/php-5.2.9/ext/xhprof/xhprof.c:213: 
error: 
syntax error before '}' token
/Users/svilen/Build/php-5.2.9/ext/xhprof/xhprof.c:213: 
warning: data definition has no type or storage class
/Users/svilen/Build/php-5.2.9/ext/xhprof/xhprof.c:222: 
error: 
syntax error before 'hp_globals'
/Users/svilen/Build/php-5.2.9/ext/xhprof/xhprof.c:222: 
warning: data definition has no type or storage class

Reproduce code:
---------------
Suggest patch: http://www.svilen.com/xhprof_0.9.1_patch_build_macosx_leopard.diff


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-18 17:05 UTC] kannan@php.net
Yes, we have not ported or tried this on Mac yet. Thanks for your patch. Will try to test out your patch on a Mac, and incorporate your patch into the codebase if it works as expected.

Thanks a lot once again for your submitting this patch.
 [2009-05-18 17:50 UTC] s dot spasov at gmail dot com
Thank you very much.
I am looking forward to the official Mac port.
 [2009-05-27 20:11 UTC] kannan@php.net
I plan to try give your patch a try in the next day or so on Mac.
 [2009-05-28 02:09 UTC] kannan@php.net
An update:

I was able to build successfully on Mac OS 10.5.4 with your patch. I built the patch on top of current CVS head (which will become 0.9.2 hopefully soon). 

One of the new timing tests recently added (tests/xhprof_005.phpt) doesn't pass on Mac 0S. I get a diff like:

002+ Failed sleep_10000_micro. Expected: 10000 microsecs. Actual: 21120 microsecs.
002- OK: sleep_10000_micro
005+ Failed sleep_20000_micro. Expected: 20000 microsecs. Actual: 58357 microsecs.
005- OK: sleep_20000_micro
008+ Failed sleep_50000_micro. Expected: 50000 microsecs. Actual: 14745 microsecs.
008- OK: sleep_50000_micro

The test source is available here:
http://cvs.php.net/viewvc.cgi/pecl/xhprof/extension/tests/xhprof_005.phpt?revision=1.2&view=markup

You can see that the test allows for about +/- 25% margin of error in expected timing. However, this test doesn't pass.

Not sure if that's a problem with usleep() on Mac OS not being reliable, or something else specific to the Mac OS patch. I need to investigate this more.
 [2009-05-28 02:20 UTC] kannan@php.net
Consider a small program like this which sleeps for 3 seconds.

<?php

xhprof_enable();
sleep(3);
$raw = xhprof_disable();
var_dump($raw["main()"]["wt"]);

When I run this with your patch on Mac, a bunch of times, I expect to see walltime (wt) of about 3 Million microsecs. But the program prints fairly fluctuating numbers each time I run:

[apples-macbook-pro:~] Apple% $HOME/php/bin/php -dextension=xhprof.so test.php
int(1274841)
[apples-macbook-pro:~] Apple% $HOME/php/bin/php -dextension=xhprof.so test.php
int(2232810)
[apples-macbook-pro:~] Apple% $HOME/php/bin/php -dextension=xhprof.so test.php
int(1496259)

On Linux, we get numbers fairly close to 3000000 each time.
 [2009-05-28 03:53 UTC] s dot spasov at gmail dot com
Thanks for the info update. I will run your test and try to 
locate the problem.
Maybe it's something related to the fact that on Mac OS 10.5 
you cannot assign a thread to a given core, you can just 
share  L2 cache between cores.
One fact comming from this is that if you have more than one 
CPU and the thread jumps from one CPU to another the L2 
cache cannot be shared between CPUs.

I have only a one CPU Core 2 Duo currently as a test 
platform, but will run the test and try to see what's wrong.

If it's helpful - more info about Mac OS X affinity API:
http://developer.apple.com/releasenotes/Performance/RN-
AffinityAPI/
 [2009-05-28 05:36 UTC] s dot spasov at gmail dot com
A weird fact I accidentally noticed is that when I start 
compiling something and run the test.php script:
<?php

xhprof_enable();
sleep(3);
$raw = xhprof_disable();
var_dump($raw["main()"]["wt"]);


I started to get correct results. See below:

Slivka:tests svilen$ php test.php
int(590174)
Slivka:tests svilen$ php test.php
int(742951)
Slivka:tests svilen$ php test.php
int(968550)
Slivka:tests svilen$ php test.php
int(604858)
Slivka:tests svilen$ php test.php
int(918241)
Slivka:tests svilen$ php test.php
int(735977)
Slivka:tests svilen$ php test.php
int(1696356)

Starting "make" to compile something for a long time and run 
the test.php meanwhile:

Slivka:tests svilen$ php test.php
int(3000220)
Slivka:tests svilen$ php test.php
int(3002940)
Slivka:tests svilen$ php test.php
int(2999799)
Slivka:tests svilen$ php test.php
int(3000064)
Slivka:tests svilen$ php test.php
int(3000335)
Slivka:tests svilen$ php test.php
int(3000026)
Slivka:tests svilen$ php test.php
int(2999763)
Slivka:tests svilen$ php test.php
int(3000325)


So when you keep the CPU busy, the test runs fine.
 [2009-05-28 19:48 UTC] s dot spasov at gmail dot com
I looked deeper in the XHProf's code and saw that you use 
CPU frequency to convert from TSC counter to microseconds.

So I found that the problem with the sleep() php function 
and the xhprof_005.phpt resides in the SpeedStep technology 
enabled by default on all Intel Macs.
When the CPU clock jumps around caused by the SpeedStep 
enabled, get_us_from_tsc() cannot make correct calculations.
When you disable the SpeedStep you get correct calculations.

After I disabled the SpeedStep on my Mac all test ran 
perfect.

You can try this either by using CoolBook application 
(http://www.coolbook.se/CoolBook.html)
or by moving to different folder or deleting 
"/System/Library/Extensions/AppleIntelCPUPowerManagement.kex
t" to stop Apple's SpeedStep driver and then reboot.
After this your Mac will run on max CPU clock and the tests 
should go well.

*) note: maybe this situation can be reproduced on Linux 
with SpeedStep enabled?
 [2009-05-28 21:58 UTC] kannan@php.net
Nice work getting to the bottom of the issue.

This (having to turn off SpeedStep) is a little bit of a bummer, because not many people are likely to remember or be aware that they need to do this in their setup.

Perhaps, the right fix for Mac OS is to use some other reliable timer. (gettimeofday() would work, but that is an significantly more expensive, at least in a Linux environment. Do you know if on Mac OS there are any fast timers?).

BTW, just curious: are you planning to use XHProf for production profiling as well, and if so, what OS do your production servers run (Mac or Linux)?
 [2009-05-29 07:30 UTC] s dot spasov at gmail dot com
Thank you!

SpeedStep is not enabled by default on Mac OS Server, it is 
only on desktop computers and laptops (as in my case I use 
Macbook for development). So XHProf can be used in 
production Mac OS X Server with no problems.

But you're right that the best fix will be using a reliable 
and not expensive timer for Mac OS platform. I am not aware 
of such, but I will make some research and will try to find 
one that is fast and can be implemented.

Yes, I am planning to use XHProf for production with heavy 
loaded websites (e.g. one is a bulgarian search engine).
OS is Fedora Core 9 on production servers, but almost all 
test environments are Mac OS X, so I really am into making a 
good patch for Mac OS.
 [2009-05-29 11:48 UTC] kannan@php.net
If this isn't going to be a problem on Mac servers, I am inclined to take in your patch and document this caveat that on desktop/laptop mac's SpeedStep would have to be disabled. At least this way, we have something available for folks to use. Subsequently, if we do find a better timer to use, then we can update the implementation.

I'll test this with SpeedStep turned off, and if it looks good, will commit your patch (with due credits of course).
 [2009-05-29 12:30 UTC] s dot spasov at gmail dot com
Thanks for your efforts, testing the patch with SpeedStep 
turned off and eventually adding the patch to the next 
release.


Unfortunately I found that the RDTSC timer can be inaccurate 
on systems running multicore processors or/and SpeedStep 
technology. This is not only related to Mac OS.
You've solved the multicore issue by adding CPU affinity, 
but SpeedStep problem is persists and is available on Linux, 
too.
http://en.wikipedia.org/wiki/Time_Stamp_Counter

One thing I can propose is the use of HPET (High Precision 
Event Timer) which is available in 2.6 Linux kernel.
http://en.wikipedia.org/wiki/High_Precision_Event_Timer

I will try to implement it and see if it's fast enough, 
cause gettimeofday() is slow, as you said before.

Another thing I haven't tested yet is mach_absolute_time() 
on Mac OS and clock_gettime() on Linux, but maybe they will 
be slow, too.

Thank you again for you fast responce.
 [2009-05-29 20:27 UTC] kannan@php.net
I have one final question about your patch. 
Why this ifndef around this block of code?

  #ifndef __APPLE__
  if (GET_AFFINITY(0, sizeof(cpu_set_t), &hp_globals.prev_mask) < 0) {
    perror("getaffinity");
    return FAILURE;
  }
  #endif

While profiling we get current affinity mask into "prev_mask". During profiling the process is bound to a CPU. At the end, we then restore the prev_mask back (in restore_cpu_affinity() called from hp_stop()).

By commenting out the above, I am concerned that "prev_mask" is now in some uninitialized state, and this uninitialized value will then get used in restore_cpu_affinity().
 [2009-05-30 04:48 UTC] s dot spasov at gmail dot com
I want to appologize in advance if my explanation is in 
someway not understandable, but I am not very good at this.

In Mac OS the affinity behaviour is handled a little bit 
different from Linux/FreeBSD. Mac OS X does not export 
interfaces that identify processors or control thread 
placement ? explicit thread to processor binding is not 
supported. Instead there is affinity sets, which represents 
affinity to the L2 cache. So the scheduler runs its threads 
using a good processor placement with respect to cache 
affinity.

Every affinity set has a affinity tag (e.g. identified by 
"mask" in the XHProf) and to a given thread can be assigned 
only one affinity tag.
By default every thread has a THREAD_AFFINITY_TAG_NULL, so 
when initializing GET_AFFINITY() will always return 
THREAD_AFFINITY_TAG_NULL and thus we don't need it, because 
by definition prev_mask is always THREAD_AFFINITY_TAG_NULL.

Let me know if I am missing something in the logic.
 [2009-05-30 17:02 UTC] kannan@php.net
Svilen: Thanks for the explanation. I guess then there is some chance that you only have L2 affinity, and that you can still end up on a different core whose RDTSC is out of sync (?).

In any case, since this mostly works, I have gone ahead and committed your patch with some minor changes:
a) removed GET_AFFINITY for __APPLE__ (since it wasn't getting used and the number of arguments to it wasn't correct anyways).
b) And, since we do not call GET_AFFINITY when __APPLE__ is set, prev_mask wasn't getting initialized. I do so using CPU_ZERO.

The revised diff is:
===================================================================
RCS file: /repository/pecl/xhprof/extension/xhprof.c,v
retrieving revision 1.4
diff -w -r1.4 xhprof.c
46a47,61
> #elif __APPLE__
> /*
>  * Patch for compiling in Mac OS X Leopard
>  * @author Svilen Spasov <s.spasov@gmail.com> 
>  */
> #    include <mach/mach_init.h>
> #    include <mach/thread_policy.h>
> #    define cpu_set_t thread_affinity_policy_data_t
> #    define CPU_SET(cpu_id, new_mask) \
>         (*(new_mask)).affinity_tag = (cpu_id + 1)
> #    define CPU_ZERO(new_mask)                 \
>         (*(new_mask)).affinity_tag = THREAD_AFFINITY_TAG_NULL
> #   define SET_AFFINITY(pid, size, mask)       \
>         thread_policy_set(mach_thread_self(), THREAD_AFFINITY_POLICY, mask, \
>                           THREAD_AFFINITY_POLICY_COUNT)
402a418
>  #ifndef __APPLE__
406a423,425
> #else
>   CPU_ZERO(&(hp_globals.prev_mask));
> #endif
 [2009-05-30 17:44 UTC] s dot spasov at gmail dot com
Kannan: I think that ending up on a different core might not 
be possible on current Mac, because scheduler run the thread 
on a core that has access to the L2 cache assigned to the 
thread by L2 affinity. That means most probably you will end 
up on the same core. Also if take for granted that cores are 
synced on boot, it is less likely to end up with RDTSC is 
out of sync.

I have an idea how this can be tested and will post results 
after I do it, but it seems the patch is working correct.

Thank you for your improvements and overview of patch's 
code.
 [2009-06-04 17:43 UTC] philip at roshambo dot org
fwiw, it compiled for me on leopard (as 64 bit) with php 5.3 
RC2
 [2009-06-04 18:10 UTC] kannan@php.net
This bug has been fixed in CVS.

In case this was a documentation problem, the fix will show up at the
end of next Sunday (CET) on pecl.php.net.

In case this was a pecl.php.net website problem, the change will show
up on the website in short time.
 
Thank you for the report, and for helping us make PECL better.

Fixed in 0.9.2.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jan 02 23:01:30 2025 UTC