There are about a dozen of us in the room. We’re sitting in a circle, on wooden fold-out chairs. The space is an artist’s loft in San Francisco; we can see evidence of their work strewn about the room: paintings both properly mounted on the wall and stacked up on the floor. The venue has been secured by a startup named Metrist, and the two founders are in attendance. Thanks to another startup, Jeli.io, there’s also an impressive spread of food, and the fridge is stocked with drinks. It would feel very sophisticated sipping chilled white wine, if we weren’t doing it out of red plastic cups.
There’s an easel set up next to the circle of chairs. But instead of a canvas mounted on the easel, there’s a whiteboard. Tonight, we’ve taken over this space for a different purpose. If you walked in on us, you’d probably think this was some sort of workshop, or maybe a team offsite planning session. Tonight, though, we’re using this space for CasesConf, a venue for sharing stories about incidents.
Every person in the room is a participant, both audience member and storyteller. We each get only ten minutes to tell our story, and then there’s ten more minutes for questions and discussion. That doesn’t sound like much time, but we need the limits to ensure everyone can tell their story. With this crowd, if there were no time constraints, the whole evening could be spent on discussing a single incident.
Nora Jones, CEO of Jeli and former Netflix teammate of mine, is hosting the event. This is her second time around; she organized and hosted the first CasesConf in New York City the month prior. I’m nominally the co-host, but in practice, all that means is that I’m given the option to go first. I gladly take it so I’ll be able to properly listen to the other stories without my own performance looming. I’ve got some notes scribbled down on a stack of index cards I always keep on me, in my back pocket.
Ten minutes isn’t much time to explain an incident to an audience that doesn’t know anything about your system. Failure modes in complex systems are, well, complex. But that’s what makes this venue excellent practice for performing incident storytelling. Every time we talk about an incident, we have to choose what information to present and how to present it, based on the constraints of the medium and the audience. You’re going to tell a different story in a long-form write up that’s released internally compared to an incident retrospective meeting where the attendees are mostly made up of people who own the services that were involved.
I tell a story of an incident that had recently impacted a sibling team of mine. Half of my ten minutes goes into the setup. I’m drawing boxes and arrows on the whiteboard to explain how the system works, before I can explain how it failed. I happened to be on-call for the service my team owns, but in this instance I was just a bystander, watching the on-call channel in Slack. It’s a small story, a vignette about a single incident responder wrestling with a service that’s going bad. There’s a round of lively discussion, and then the next storyteller gets up.
Because the event is small, invite-only, and off-the-record, it feels intimate. Trust is high, and people share their stories freely, even though most people here are meeting for the first time. The rest of the evening is a blur. My memory is poor, and so I generally depend on written notes to remember things, but there isn’t any note-taking going on this evening. I do remember the feeling, though. It’s one of community, of telling stories about experiences to people who know just what it’s like.