2 min read

Testing in production

How much are you spending to find bugs in your tests?
Testing in production

Back in 2011 the scientists at NASA's Mars Science Laboratory mission faced a sticky problem. Getting a rover into Mars' orbit was relatively easy. Actually landing a rover safely on Mars' surface was very hard.

The team spitballed many solutions, including putting the rover into a big balloon and letting it bounce to rest on the surface. Eventually the engineers rejected the bouncy castle approach and decided to drop the rover onto the Martian surface with a gigantic parachute.

Before launch, the NASA engineers had to test that the parachute landing would actually work. But how do you test a parachute deployment without reproducing the Martian atmosphere? After much head-scratching, the team tried to simulate the G-forces on Mars' surface by deploying the parachute in a gigantic wind tunnel.

After a few successful deployments, the unthinkable happened. Somehow the parachute turned inside out and exploded into shards of fabric. Uh oh.

The team tried to figure out what had happened by instrumenting the test with ultra high speed cameras. But try as they might, they were unable to reproduce the error. Fear and uncertainty put the entire project in jeopardy. Then by chance, the team was able to reproduce the explosion. The video captured exactly what was happening.

In a wind tunnel, a parachute has to deploy sideways. Because of its size, the weight of the fabric at the top caused the top of the parachute to dip and catch wind on the wrong surface. If that happened the parachute would turn inside out and explode.

So after months of effort and millions of dollars, the scientists at NASA discovered a bug in the test. Confident now that the parachute would not shred in the Martian atmosphere, the NASA team was able to green light the launch. Curiosity has been cruising around the surface of Mars for years now, doing important science.

For software engineers this story has a few very important takeaways:

  1. Trying to create a production-like test environment is a very poor application of force. You can spend millions of dollars and thousands of hours, but inevitably you'll end up finding bugs in your tests, rather than bugs in your code.
  2. If you want to see how your code holds up in production, you have to put it into production. Unless your app is exceedingly trivial, you can't predict, reproduce, or simulate production conditions. You may as well try to reproduce the atmosphere on Mars.
  3. It's better to train your teams to be "fast to fix". That means you will invest in sensors, rather than elaborate inspection rituals. When you know how to do sensor-driven-development (SDD), you will make fewer assumptions and your confidence and velocity will go through the roof.