Tuesday 6 January 2015

New Year's Resolution for Data Management : Stop Using Analogies

I recently found myself in a surreal discussion in which it was suggested to me that meta-data was like poetry. I was asked to consider the line "I wandered lonely as a cloud" and it was suggested to me that "I" am the data, "lonely" is meta-data, and "as a cloud" is meta-meta-data. My interlocutor beamed at me with pride, and waited for my confirmation of their brilliance.

It's poetry, not meta-data

My heart sank as I realised that analogies are really not very do not helpful. Like many in Information Management, I have been guilty of using them far too much. I have now resolved to stop using analogies, and instead to make the effort to understand how others see the world, and to explain Information Management in terms that they understand about data that they really use.

Why do we use analogies?

A lot of the material in Information Management is pretty abstract, and it only really makes sense once you have already done it a couple of times. Anyone working in Information Management will be familiar with the challenge of explaining what it is and why you should do it. Meta-data is a great example, and the simple definition of "data about data" doesn't really help anyone new to the topic. So there's a temptation to introduce analogies. As well as poetry, a recent engagement threw up analogies including finger prints, traffic rules and photography. We use them to try and explain concepts that we understand to someone who doesn't.

Why analogies don't help

There are two problems with analogies, though. The first is that they eventually break down. Once you have introduced an analogy, the discussion inevitably explores it further and you end up debating where it is valid and where not. The second problem is that an analogy only really works for the person who came up with it. One of my favourites is weeding a garden as an analogy for data quality. To me it makes sense, because the weeds will always come back, whatever I do. So although I can aim for a weed free garden, I know that I will never achieve it. It's the same with Data Quality, while you may aim for zero defects, you can never actually achieve it. For those who don't garden, it doesn't help. For those who do garden, the discussion moves on quickly to dandelions, moss and bindweed. Either way, we're not getting very far with Data Quality, and the analogy breaks down because no one ever creates a weeding dashboard or assigns gardening stewards.

It's weeding, not Data Cleansing


From the specific to the general and back again

One of the problems comes from the way that we in Information Management think. As a group we tend to look for patterns and we are constantly seeking general abstractions in a sea of specific examples. A good example is the party data model which was born from the observation that customers, suppliers, employees, representatives and so on have common attributes and that they can be generalised as persons or organisations, that they can be related and so on. There are other generalised data models that have been developed over years. It's what we do, we can't help it. That's why we ended up in Information Management in the first place.

The problem is that the rest of the world doesn't think like this. Most people consider customers and suppliers to be fundamentally different, and a party is an event where people celebrate. We need to get back to specifics that are relevant to our stakeholders and give them concrete examples of what we are talking about.

My kind of party


From poetry to SOAP

In order to move the conversation away from poetry and onto something more useful, I dug a little deeper to find something more specific and relevant to someone seeking to understand meta-data. The guy that I was talking to had experience of integrating systems, so we talked about exchanging data between two or more systems. He suggested SOAP (www.w3.org/TR/soap12-part1) as his preferred protocol. Then we discussed how a SOAP message is specified as an XML Information Set, and that this is an example of meta-data. As we were on familiar ground, I could explain to him why such an Information Set would need to be owned, why it should be approved, why changes should be carefully managed, and what the risks of an incomplete definition would be. From this example, which he understood, the definition of "data about data" made sense, the need for formally managing it made sense, and he could begin to understand that these principles would apply in other scenarios.

And finally

If you must use poetry as an analogy for meta-data, then I suggest commentary on poetry is better. Lewis Carrol's Jabberwocky is one of my favourite poems, the wikipedia entry for it is a commentary and so it is writing about writing (en.wikipedia.org/wiki/Jabberwocky).

Oh dear, I've just broken my New Year's Resolution already.