20 Feb 2015 Arguing against oneself.

I'm writing this in an airport lounge ready to head home from sunny Australia to chilly UK. For the past week I've been helping to troubleshoot a large, complex adhesion issue. This blog is not about the issue itself. It's about how a troubleshooting team goes about solving a major issue.

We arrived with plenty of preconceived ideas from the reports provided in advance. The first rule of troubleshooting is that no matter how good these reports are (and these were good) they are no substitute for the real thing. Words that are crystal clear to those who wrote the reports are equally clear to those who read them, yet convey totally different information. Only when you see the samples and talk to the people who've been working with them through the evolving crisis do the divergent views start to converge so that the real problem is clear.

Unfortunately the wonderful solutions you'd imagined would help the situation start to look rather less brilliant in the light of the new information. But it's tempting to cling onto them. In these early stages everyone is desperate for any potential solution so ideas you know are not as good as you'd hoped get seized on. This is the first time you have to argue against your own ideas or hope that others in the team will do so.

At this stage some new ideas start to emerge. Again, desperation makes the ideas seem better than they really are. But at least they are exploring fresh ground and suggesting new tests and experiments that might uncover some new information. In this case a rather simple idea seemed to be taking us in a profitable direction. The second rule of troubleshooting is to go into the lab to try out new ideas. At this ill-formed stage, the ideas will be missing some key elements so trying them out yourself is the fastest way to work out what the idea is really about. In this case the initial results were rather encouraging - and turned out to be utterly misleading.

It's when ideas seem to be pointing in an encouraging direction that the team need to argue against the new orthodoxy. Fortunately this team were not shy in arguing and the flaws in the new idea increasingly became clear to us all. At the same time, the idea had not been wholly wrong. The experiments that revealed its flaws also threw up new data that pointed in a somewhat different direction.

Now we made a lot of progress and the latest orthodoxy seemed even more encouraging. So this was the time to be especially keen about arguing against ourselves. A rather neat hypothesis emerged that could be readily tested and - tada the results were positive. But although the results were exactly as predicted, they required an utterly unlikely level of force in the test - a level that would never happen in the situation where the problems arose. A disagreement arose in the team about one aspect of the test so one of us had to go down to the lab to check up on the details. All it had needed was a quick check down a microscope. The team member confirmed that our optimism was misplaced in one key way - and this triggered the key experiment that finally brought the story together. In a lab full of sophisticated equipment, poking the sample with a ball point pen turned out to be the single most important experiment of the week - the failure pattern at last reproduced the one we were looking for.

We could have done that "experiment" with the ball point pen at any time if only we had thought of it. But it is the nature of troubleshooting that things are only obvious in retrospect. The false steps in the early stages were mostly made in the dark - mostly but not totally. At each stage the science was plausible. They key was to go with the science then use the results to refute our favourite hypotheses and generate some new ones. At one stage I was arguing for one point of view and a colleague was arguing the other. I won that argument. But the next day new data made it clear that we had to argue ourselves back to an intermediate position containing elements of both views.

The third rule of troubleshooting is that arguments depend on their merits and not on the person or his/her seniority. Over the week we all had some dumb ideas that didn't work out and all had some insights, data or viewpoint that contributed to the overall process.

There is so much going on during a troubleshooting process that it is easy to lose track of ideas or information. The fourth rule is to document ideas and decisions as the investigation goes along. I am bad at this and it was fortunate that two of the team were especially good at this. At more than one crucial point it turned out that even the good note takers had each missed a point that the other, fortunately, had remembered.

So, am I reporting that the mission is accomplished and the problem is solved? No! The point of this week was to clear the deck for some much more detailed science with complex analytical equipment. "All" we were doing was making sure that the analytical guys would have access to relevant samples that they could then analyse. Before we started there was no clear idea of what "relevant" would mean nor how to prepare the samples. It seems likely that we've got the next stage of the investigation into good shape. Likely, but by no means certain. The fifth rule is to never say "Problem solved onto the next". We have to be as self-critical about where we've reached as we were at the start when we found most of our favourite ideas crumbling before us.

So I've now got 20hrs to mull over what we've achieved and what we've missed. And, hopefully, to find the weak spots in what we've done, ready to challenge them in the week ahead.