Searching for truth in data: A policymaker’s dilemma in the digital world
First published on worldbank.org here. By Roumeen Islam. | January 25, 2021
Humans have used data to make choices from the beginning of Time. To the hunter-gatherer, information regarding the terrain to traverse to the next meal, the strength of the prey, and the competition to be faced in making the kill, were all very useful bits of data. When she settled down to agriculture, it became important to know the quality of the soil, availability of water and weather patterns. Individual bits of data or information were not enough though. One had to combine them in mathematical transformations and produce assessments; and eventually, pass them on to others. In relaying this information, the vocabulary available to describe each circumstance got richer. Research by Harvard and Google shows that today, there are reportedly over a million words in English alone, with the number growing by several thousand each year; each person, however, only knows some thousands, unique to her. And, so, the number of perspectives, the truths, got richer and multiplied. Was the hippopotamus enormous and belligerent, or just grey and somewhat fat?
It was not just the richness of data in nature, nor the words created by humans to describe what they sensed, that led to the explosion of perceived information. We created rich civilizations, added social complexity to natural complexities, and continued to collect data. A recent study shows that there was 1000% more data in 2006 than there was in 1999. Another report finds that online data has increased around 1500% since 2017. The actual numbers are large beyond human imagination, and it remains unclear who owns all the data. With the digital revolution, we have the ability to gather information on an incredible number of natural phenomena and human actions. Yet, we do not fully understand all the complexities of what we are sensing, nor how machines are analyzing them. Those “processing” the data may not know the insight that may be got, or the answers being sought.
It is the narrative that helps us sift through the data. Yale economist Robert Shiller rightly says the narrative or story is an “essentially human” work. The narrative is bound by our history, culture and simply the number of words we use. It reflects the ambition and desires of the narrator as well as the beliefs and aspirations of her listeners. The power of the narrative is evident in every nation’s political, economic and social evolution. With digital technology, narrative travels at unprecedented speeds. And recent history demonstrates the impact of speedy communication. So, if Data are worshipped as Truth, but described in words travelling faster and faster, it is imperative to ask simultaneously, as it never has been before, whose Truth is it?
This is a really important point. Humans have bounded rationality; they are limited in terms of their cognitive ability, not just their words. We simply cannot process all the data even if we know it is there. And when we program machines and algorithms to process it, for now, the machines are still bound by the imaginations that created them. Moreover, it may not even be clear who needs access to the information so that outcomes are improved- how much transparency is needed about the base data, the compressed data and all the manipulations of it- for example the black-box of algorithms?
Which leads to some considerations for the policymakers who use data to influence peoples’ actions and through them, outcomes. Firstly, the same data can be used to describe the glass as half empty- or half full. In a world of limited cognitive ability, time and attention span, the narrative matters for the responses one wants to elicit, and for peoples’ lives. Thus, it is important to acknowledge possible flaws in the interpretation of data and the narrative.
Secondly, it is debatable how much precision is possible or should be expected: when does it matter if the glass is expected to be 50.5% full and when is 50% a good enough approximation? Perhaps it is more important to acknowledge the imprecision of the information and predictions ex ante, whilst preparing better for the potential downsides, than to be tied to a false precision. It also means giving better explanations and justifications for those numbers used in policy rules or targets- such as tax rates or carbon emissions reduction targets.
Thirdly, more and more data, analyzed with existing tools, may not help policymakers divine much better which actions will lead to which consequences. Other ingredients that continue to be equally important with increasing data collection and crunching, are the exercise of judgement based on a history of prior experience, or experimentation with learning. Finally, each society will need to determine who holds the property rights and access to data- influencing what data can do.
In sum, as we search for, collate and analyze more data, it is important to simultaneously reflect upon the practical ability we gain to take action for the common good, which depends on the narratives we choose, the lessons from experience, the boundaries we set on the use and ownership of data, as well as on the sophistication of the techniques we use to inform our decisions.
DIGITAL DEVELOPMENT.