Social Research and the Software Industry

I recently attended a Symposium in LA which considered the status of Learning From Incidents within the software industry. Other attendees were very senior engineers, researchers, or are otherwise prominent in the contemporary #discourse swirling around LFI and related topics like the work of Site Reliability Engineering. And in spite of the fact of my invitation to attend and that obvious evidence that the organizers deemed me sufficiently qualified to be there, the whole thing was, for me, a surreal experience. I chalk this up to the fact that I’m not “technical.”

Now what does this mean? LFI is a concept which adapts ideas from the academic field of safety science for the software industry. At a high level, the goal of learning from incidents is to affect, for the better, the development of software. That might mean building new tools for creating and sustaining “Critical Digital Services,” or trying out various techniques for handling operating software in production. Regardless of the specifics you’d probably expect the people involved to be practicing software developers or operators who have experienced or seen problems with the way things are done now and want to change them. 

That’s not me. My educational background is in philosophy and, broadly speaking, qualitative social research. My professional background is in account management, specifically post-sales customer success. I previously worked in a customer support role at a company that offered a CI/CD tool as a SaaS, and currently work as a Technical Customer Success Manager at Honeycomb.io. So I don’t really match the pattern of the types of people who you’d expect to be involved that I characterized before.

So if I’m not bringing deep expertise with the tools of the trade, what do I bring? Precisely the background that I do have: knowledge from the fields of philosophy and the various social sciences, and experience navigating the organizational dynamics involved in sustaining human-machine relationships through time. 

How is that useful to LFI? Allow me to demonstrate with an example: 

In his classic How Complex Systems Fail, Richard Cook cites ethnographic literature to argue that the ‘discovery’ of a root cause of an accident is a social construction meant to localize and isolate responsibility for a disruptive event. Building upon this, Sidney Dekker argues that a “restorative just culture” which avoids such isolation and promotes the sharing of narrative accounts of a person’s precipitating involvement is a more fruitful way of repairing the damaged social relationships. This practice allows the participants to glean insights about the normal social and technical operations of the system such that they can update their priors and are therefore better fit to their dynamical environment. LFI has, broadly speaking, adopted these two points as core tenants.

Focusing in on the testimonial portion, LFI could learn a thing or two from another culture that has practiced restorative justice. Specifically I’m thinking of the Tiv people, who live in modern day Nigeria, as described in Paul Bohannan’s Justice and Judgement Among the Tiv.

As I described in another context, a core juridical practice of the Tiv involves establishing a jir, a temporary space within their territory. That territory is known as a tar, and consists of an intertwining of the local geography and the Tiv’s familial and communal relations; it’s a socio-ecological territory. The Tiv establish a jir for the purpose of ‘repairing the tar’, or to address breakdowns of territorial cohesion. Bohannan describes this as a method of producing and reproducing a modus vivendi. In order to do this, they elicit two forms of testimony: mimi from the principle litigants, and vough from relevant witnesses.

Mimi is a mode of testimony in which those principle litigants describe what happened from their own perspective. It is a narration of how things seemed to them and how that led them to do what they did in the particular way that they did it. Vough, on the other hand, is a mode of testimony which aims to produce a ‘precise’ description of the chain of events and the causal mechanisms which interlink them. You might think of this as a linearized account of the socio-technical details from a 3rd-person perspective.

One important lesson that someone might take from these practices is an appreciation for the fact that both mimi and vough, and their interrelation with each other, are necessary for the Tiv to repair their tar. This method of restoration operates by creating a space to present both types of testimony to the community, for them to learn from each about how the breakdown occurred, and for the establishment of a new common sense or common understanding. That shared understanding is what legitimates the judgment and determination of responsibility that the elders of the tar, who preside over the jir, render and thereby produces justice. The achievement of justice is the sign that the tar is repaired.

I submit that LFI can and should learn from the Tiv. Those associated with LFI engage in much discussion around topics like blame, communal learning, psychological safety, and social organization. A rethinking of LFI’s practices and approach to “blameless post mortems” in light of the jir may, for example, help to think through the tension between recognizing that blame is a social construct while also acknowledging that “taking responsibility” is the other end of the spectrum. Blaming is a retrospective assigning of responsibility which isolates or removes the determined offender from the community in order to establish cohesion; “taking responsibility” is a prospective action performed in order to integrate someone and to propagate cohesion into the future. Either aspect would depend upon the establishment of a shared understanding of the situation or “common ground,” which is one role that the post mortem performs in an organization.

With this example, I hope that I’ve shown the value of seriously engaging with “social research.” Considering our own practices and behaviors and learning about ourselves may benefit us by creating a shared understanding, which is the basis of a coherent community, and may also help us to communicate what we’re doing and how it may impact our organizations. And it’s insights like these that I personally hope to develop and share with the community and the broader software industry.