Activity Monitor: A Private Eye To Shadow Your Unfaithful Applications
by: Nate
Friday, June 13th, 2008I thought we had a good thing going; I thought we could trust each other. I wouldn’t play with her heart, and she wouldn’t let me down. We seemed to be in a stable, long-term relationship. Then she just stopped trying and completely locked me out. I couldn’t figure out why her behavior just changed overnight, and I didn’t know what else to do. So I had someone follow her, to find out what was really going on.
While I would never hire some sneaky character to follow a person, I have no such qualms about having something stalk a misbehaving application. A problem came up with a Mac program that I had written in C. It would run great for a while, and then after some random number of days or weeks it would simply stop producing output. Not only would it apparently stop doing anything, it would also use all available CPU cycles. It was furiously doing nothing.
It was difficult to debug, since I couldn’t really set breakpoints to try to catch something that might not happen for weeks. And there didn’t seem to be a way to just break into the current instruction, like I am used to doing in embedded programming.
Lucky for me, my boss Alex happened to notice an interesting function of the Activity Monitor while experimenting on his Mac. Though I had used that utility before, I had completely overlooked the intriguing button at the top labeled “Sample Process”. Using this understated gem, you can select any running process on your machine and take a peak at what it’s doing with its time. There are a few different ways to have the information displayed, but I found the most useful to be “Percent of Thread”.
What I did was take a sample of my application while it was running normally, to get a baseline. There wasn’t anything too surprising in there; CPU usage seemed to be widely distributed among a variety of function calls. Then I sat back and waited for it to lock up, as I knew it eventually would.
So a few weeks later the application became unresponsive again. I took a new sample and what do you know? There was one function call that was using 99.7% of the CPU cycles. It turns out that there was a problem with the hardware drivers I was using. They would get in a bad state in which a particular asynchronous process would never complete. The interface was such that I had to poll to check if the process had completed. It was never completing and I was polling as fast as I could, which maxed out the CPU usage. Long story short, it was a recoverable problem and I just had to put in a timeout to give up waiting after a reasonable amount of time.
If you think your application is running around behind your back, sample it and find out what’s really going on.














