Surprise tracking

One of the best pieces of advice I've heard about planning was to project yourself forward in time and imagine that the plan has just failed. Are you surprised? If not, your plan needs work.

I think surprise is a seriously undervalued intuition, because it can roll up a whole lot of different factors into a prediction in a very quick, intuitive way. Other ways of accessing those fast predictions ("do you think it will succeed?", "do you think it will fail?", "what risks can you think of?") all seem to get bogged down in biases like optimisim or social pressure, or end up being more a test of imagination than prediction. I wrote about using the worst unsurprising case exactly because I think surprise is uniquely powerful for linking prediction and risk.

But something I only thought of recently is that it might be useful to use this as an ongoing measurement. So instead of just asking "how surprised would you be if this plan failed" at the start of the plan, you could ask on a regular basis throughout the plan's life. Instead of having just one value to work with, you now have a trend. Ideally, that trend should be towards more surprise, or at worst the same. If not, it's probably a sign that your plan is in trouble.

It could work well for more specific predictions as well, like "how surprised would you be if this part of the project took longer than expected", or "how surprised would you be if we got feedback that our software was too hard to use". Over the lifetime of a project, a bunch of similar surprise tracking questions could paint a pretty interesting graph of a whole team's intuitions about the project's success.

It's already very popular to continuously track certain metrics over a project's life, but these are usually objective quantities like burn rate, server capacity, or number of users. Tracking subjective metrics seems like it could be pretty useful too, as long as those metrics were predictive. I think surprise tracking would be a good foundation for that.

Prototype wrapup

I've been meaning for a while to get into a habit of prototyping in a more systemic way. I wrote a while back about the benefits of prototyping, and more recently about wanting a more exciting version of a code-every-day challenge. What I'd ultimately like is for prototypes to fill that space, or at least part of it. To that end, I decided to commit myself to making one prototype per day this week.

That didn't go so well, mostly because my prototype discipline was pretty lax and I made overcomplicated prototypes that were basically mini-projects. So I didn't get done as many as I wanted, but I'm quite happy with the ones that did:

Davescript

source

This was a silly little stack language I threw together for a friend as part of an ongoing joke about compile-to-js languages. It has a hello world that should give you some idea of how it's meant to work. I ended up spending a lot of time thinking about how to make an efficient streaming parser, but in the end I gave up and just read each line into memory.

Time: 2 hours.

Whipcalc

source demo

When messing around with SDR, it really pays to tune the length of your antenna properly. However, the only decent site I found to do it gave lengths in feet and inches, and that extra conversion step was kind of cramping my style. So I thought I'd use it as an opportunity to learn PureScript, which is a kind of Haskell-meets-Javascript language I've had my eye on. I figured being a very minimal web challenge would stop it from taking too much time, but I was wrong.

Actually, I got tripped up by the fairly involved setup/build process which took up a significant chunk of time. Another big problem was the web framework, which was 99% fine, but that 1% (an HTML attribute it didn't know about) ended up taking over an hour to figure out and I still don't think I figured it out properly. Type safety!

Overall I feel positive with the result, though I'm still unconvinced that Haskell is a good choice for making webpages. I'll probably give it a try again later to see if there's light at the end of the learning curve.

Time: 7 hours.

Infinite Gest

source demo

This was really a new project masquerading as a prototype. I've wanted to look into the idea of making a minimal sketching tool for ages. The idea is that instead of having a bunch of tool UI, you just sketch things and it uses gesture recognition to automatically turn your badly drawn shapes into beautiful platonic ideals. A kind of slight upgrade from paper, but not so much that it interferes with the process of just sketching. Also, the pun name I came up with was so amazing I just had to do it.

I really had a hard time with additive vs subtractive on this one. Initially I wanted to just build a simple version, but some of the things (infinite scrolling canvas, drawing shapes and moving them around) are complicated enough that I was lured into going full framework. Ultimately, that led to a better, more comprehensive demo at the end, but it turned what could have been a few hours into a lot more.

Time: 8 hours.

So in total I spent 17 hours on prototypes, which could have been more than enough for 7*2 hour prototypes if I'd been more modest in scope. I can't say I necessarily regret taking the ideas as far as I did because they turned out pretty well, but I definitely know I can't have that at a daily output. Maybe that will turn out to be fine, but for the time being I still want to validate the idea of regular prototypes.

I'm making a smaller commitment for next week. 7 was definitely too many, I think 5 would be a nice number, but I'm going to start at 3 and make sure I'm doing it effectively before I scale back up.

Defence in depth

A few things I've written recently, including what the hell, against identity, motte and bailey goals, do and do not and going meta, have been about failure and how systems break down. I think there's an interesting unifying idea there that's worth going into separately.

I remember reading about NASA's software engineering in Feynman's notes on the Challenger disaster. Unlike the other departments involved, the software team had an incredible resilience to failure. In addition to fairly stringent engineering standards, they would do a series of full external QA tests and dry runs of all their systems, simulating an actual launch. Plenty of teams do dry runs and QA testing, of course, but the difference is that QA failures were considered almost as serious as real failures. That is, if your software failed these tests, it didn't kill anyone, but it probably could have.

At the heart of this is a paradox that speaks to that general problem of failure: you want to catch failures early, before they cause real problems, yet you still want to treat those failures as real despite their lack of consequences. Let's say you have to finish writing an article by a week from now, but to add a bit of failure-resistance you give yourself a deadline two days earlier. That way, if anything goes wrong, you'll still have time to fix it by the real deadline. Sure, you could say you're going to treat your self-imposed deadline as seriously as the actual deadline, but that's kind of definitionally untrue; the whole point of your deadline is that it's not as serious as the real one!

The general principle here is defence in depth: design your system so that individual failures don't take down the whole thing. An individual event would need to hit a miraculous number of failure points at once to cause a complete failure. But that assumes each event is discrete and disconnected from the others, like someone trying to guess all the digits of a combination lock at once. In reality, if smaller failures are ignored or tolerated, you really have one long continuous event, like a combination lock where you can guess one digit at a time. The difficulty per digit becomes linear when you really wanted it to be factorial.

In order to get that, you have to make sure that those individual failures can't be allowed to persist. But that is much easier said than done. Both Feynman's notes on Challenger and the NASA satellite post-mortem I referenced in going meta revealed this very problem in their organisation: the culture let individual errors accumulate until their defence in depth was broken. But in neither case was the general problem of slow compromise of defence in depth really addressed.

I see the main issue as proportionality. If you tell someone "the failure of this one individual widget should be treated as seriously as the failure of the entire space shuttle", all that's going to do is destroy the credibility of your safety system. Similarly, setting up your goals in such a way that one minor failure can sink the whole thing is just silly. I think what the NASA software team got right wasn't just that they took their QA testing so seriously, but that they also didn't take it too seriously. Failing QA might mean a serious re-evaluation, but failing the real thing probably means you lose your job.

A significant secondary issue is that the consequences increase enormously as you go meta. A single screw not being in the right place is a very minor failure, and deserves a very minor corrective action. However, failing to correct the missing screw is a much more major failure. It may appear to be just another aspect of the minor failure in the screw system, and thus fairly unimportant. However, it's really a failure in the defence in depth system, which is the kind of thing that can actually take down a space shuttle. Perhaps that counterintuitive leap from insignificant failure to catastrophic meta-failure is at the heart of a lot of defence in depth failures.

In the absence of any guidance from NASA, I'd suggest the following: set up your system in layers to exploit defence in depth. Make sure the consequence of a failure at each layer is serious, but not too serious to be credible. And make sure that failures in the layering itself are considered extremely serious regardless of their origin, as they have the most potential to take down the system as a whole.

What the hell

I learned about an interesting bias recently. The most official name seems to be "lapse-activated causal patterns", but the more fun name is the "what the hell" effect. It's when you have a particular system you're trying to stick to, for example a low-calorie diet. If you have a moment of weakness and eat a donut, that shouldn't affect your decisions about subsequent donuts; the donut is a sunk cost and the best thing you can do is move on. However, what we tend to do instead is think "ah, what the hell, since I'm already failing at my diet..." and blitz through the entire donut box.

I think there are two interesting ways to look at this. The first one is in terms of identity: deep down your goal wasn't "eat as few donuts as possible", it was "be the kind of person who doesn't eat donuts". One slip-up and that identity is ruined; you're a no-good donut-eater regardless of the quantity involved. Another way is as a consequence of a kind of absolutism. After all, you want your goals to be locked down so you can't just optimise the goalposts. But if your goal was "never eat a donut", you've now failed that goal. And since that goal's ruined anyway, you may as well chow down.

Which is a new way of thinking about my earlier idea of do and do not. The point is that when it becomes clear that a goal is unattainable you can find a new compromise goal, even if only to keep you in the habit of following through, and make it a little bit (but not too!) uncomfortable to fail. But it's also a good way of fighting the "what the hell" effect if your issue is absolutism. You can no longer do "no donuts", the natural reaction of "do not" would be giving up entirely, but the compromise between the two would be eating only one donut this week.

That's not the same thing as succeeding, of course; you still failed to achieve your original goal. But that doesn't mean you should also abandon nearby goals that are the best option remaining after the sunk cost goal is buried.

Against identity

My recent efforts at making a commitment platform – what eventually became obligat.io – led me down a bit of an interesting trip through the psychology of goal setting. I had a hunch that what I was doing would be useful to me, but no evidence backing it up in the general case, so I went looking. Surprisingly, I found a lot of material saying you shouldn't tell people about your goals. Oops!

Digging a little deeper, that advice ultimately comes from the great work of Peter Gollwitzer who, among other things, pioneered symbolic self-completion theory and the idea of self-defining goals. That is, certain goals like "I want to be a doctor" aren't about taking a specific action, they're about being perceived (and perceiving yourself) in a certain way. Symbolic self-completion theory says that when that identity is threatened, you seek out symbols and demonstrations to prove it. Similarly, when working towards a self-defining goal, those kinds of identity-demonstrating behaviours can substitute for actually achieving anything.

Which, to reel it back in, is not the same thing as saying don't tell people about your goals. The mechanism at work is that talking about your identity goals is a way of demonstrating that identity. So talking about all your amazing plans for being a doctor makes you feel like a doctor, and paradoxically reduces your need to actually be a doctor. I feel relatively safe in obligat.io's model, because the commitment structure is fairly specific and accountable, so it should be less vulnerable to identity weaseling. In fact, Gollwitzer's paper suggests this as one of the possible ways to mitigate the effect.

Honestly, the whole identity thing seems kind of inelegant. Why define yourself as the kind of person who does thing X rather than just... doing X? It seems like the latter gets more done with less baggage. In software we've had a storied history constructing elaborate definitions of types of things, and many developers now believe that just expecting certain behaviour, rather than a certain type identity, is a faster and more flexible way to work. In a sense I think identity is similar to happiness: an indirect, consequential property that we've started trying to manipulate directly. Maybe we would be better off if we just got rid of it.

That said, Gollwitzer found at least some benefit to identity goals, and regardless we're probably stuck with our particular set of mental quirks for the forseeable future. The one thing I would say to that is that we can definitely decide what gets to be part of our identity. In which case I would suggest that a good identity is like a good type system: minimal. Maybe you don't need running or fishing or medicine to be your identity, with all their attendant risk of having external factors define your identity.

Perhaps a better way to approach your identity would be to just prune things away until you find the parts of yourself you'd never want to change. That is, instead of defining your identity as things you do, whittle it down to just the essential elements of your character.