Via Kevin Kelly’s Cool Tools mailing list come a recommendation for Debugging: The Nine Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems by David J. Agans. I paste the summary from Kevin’s email below. If you’ve programmed for more than a few years you’ve probably worked them out for yourself but its probaby worth having them pinned above your desk so you can “grab the bar”.
The Rules – Suitable for Framing
- Understand the system
- Make it fail
- Quit thinking and look
- Divide and conquer
- Change one thing at a time
- Keep an audit trail
- Check the plug
- Get a fresh view
- If you didn’t fix it, it ain’t fixed
- Change One Thing at a Time
On nuclear-powered subs, there’s a brass bar in front of the control panel for the power plant. When status alarms begin to go off, the engineers are trained to grab the brass bar with both hands and hold on until they’ve looked at all the dials and indicators, and understand exactly what’s going on in the system. What this does is help them overcome the temptation to start “fixing” things, throwing switches and opening valves. These quick fixes confuse the automatic recovery systems, bury the original fault beneath an onslaught of new conditions, and may cause a real, major disasters. It’s more
effective to remember to do something (“Grab the bar!”) than to remember not to do something (“Don’t touch that dial!”)
So, grab the bar!
Understand the System
You need a working knowledge of what the system is supposed to do, how it’s designed, and, in some cases, why it was designed that way. If you don’t understand some part of the system, that always seems to be where the problem is. (This is not just Murphy’s Law; if you don’t understand it when you design it, you’re more likely to mess up.)
Make It Fail
So you can tell if you’ve fixed it. Once you think you’ve fixed the problem, having a surefire way to make it fail gives you a surefire test of whether you fixed it. If without the fix it fails 100 percent of the time when you do X, and with the fix it fails zero times when you do X, you know you’ve really fixed the bug.
If You Didn’t Fix It, It Ain’t Fixed
When you think you’ve fixed an engineering design, take the fix out. Make sure it’s broken again. Put the fix back in. Make sure it’s fixed again. Until you’ve cycled from fixed to broken and back to fixed again, changing only the intended fix, you haven’t proved that you fixed it.
Ask for help
There are at least three reasons to ask for help, not counting the desire to dump the whole problem into someone else’s lap: a fresh view, expertise, and experience. And people are usually willing to help because it gives them a chance to demonstrate how clever they are.
No matter what kind of help you bring in, when you describe the problem, keep one thing in mind: Report symptoms, not theories. The reason you went to someone else for fresh insight is that your theories aren’t getting you anywhere. If you go to someone fresh and lay a theory on her, you drag her right down into the same rut you’re in. At the same time, you’ve probably hidden some key details she needs to know, because your bias says they’re not important. So be firm about this. When you ask for help, describe what happened. Describe what you’ve seen. Describe conditions if you can. Make sure you tell her what’s intermittent and what isn’t. But don’t talk about what you think it the cause of the problem.
Though the terms are often interchanged, there’s a difference between debugging and troubleshooting, and there’s a difference between this debugging book and the hundreds of troubleshooting guides available today. Debugging usually means figuring out why a design doesn’t work as planned. Troubleshooting usually means figuring out what’s broken in a particular copy of a product when the product’s design is known to be good–there’s a deleted file, a broken wire, or a bad part. Software engineers debug; car mechanics troubleshoot. Car designers debug (in an ideal world). Doctors troubleshoot the human body–they never got a chance to debug it. (It took God one day to design, prototype, and release the product; talk about schedule pressure! I can we can forgive priority-two bugs like bunions and mail pattern baldness.)
The techniques in this book apply to both debugging and troubleshooting. These techniques don’t care how the program got in there; they just tell you how to find it. So they work whether the problem is a broken design or a broken part. Toubleshooting books, on the other hand, work only a broken part.