A country of over 1.3 billion people with rapid technological growth, India is a data goldmine waiting to be tapped into. The Indian government is trying to do so with Digital India, a flagship mission that seeks to transform the country into a world-leading knowledge economy.
Having missed out on previous economic and technological revolutions, there is a pressure to leverage India’s position as a data-rich country to power development of the AI ecosystem. Succeed with data, and India will be well-placed for the fourth industrial revolution. Giving domestic corporations the power to access open data is also a way to power domestic innovation: thie can divest power from Big Tech, making it harder for the major players to monopolise data collection and value in the digital ecosystem.
Digital India adopts a nine pillar approach to digitising India, with one dedicated to enhancing “information for all”. The vision is that of transparency and easy access — swathes of public information in open formats, usable by all. Policy frameworks already in place, such as the National Data Sharing and Access Policy (NDSAP), and programs like the government’s open data platform, fall under this pillar of Digital India.
However, these initiatives suffer from a number of gaps. The NDSAP is just a policy document, meaning it doesn’t have the statutory power of a law. The NDSAP also doesn’t have a mechanism that ensures datasets are published fully or in a timely fashion. Without a guiding law, government departments already straining with a lack of capability often adhere to the policy pro forma.As a result, datasets on the open government portal frequently have many issues — they may be poorly standardised, not provided in open formats, incomplete, outdated, or lack proper annotation and metadata. The NDSAP framework hasn’t implemented processes to make republishing and reusing data easy.
The government has recently proposed a National Data Governance Framework Policy. This seeks to promote data-driven governance and catalyse a start-up ecosystem in India by providing access to data sets containing non-personal data that have been collected and curated by the government. The framework envisages a new data portal to make government data accessible both to government entities, as well as companies and researchers, on a permissioned basis.
The platform will be overseen by a new (non-statutory) institution — the India Data Management Office (IDMO) — which will be responsible for framing rules and standards for data management across government departments, and managing access to data sets. Importantly, the policy seeks to promote capacity development within each government department by establishing dedicated data management offices.
While the policy seems well intentioned, it has shortcomings. Developing an alternative data access portal could lead to existing (permissionless) open data initiatives (such as the data.gov platform) being orphaned, thereby reducing open access to government data. It is notable that the framework does not specify any objectives around citizen participation in governance or enhancing government transparency and accountability. In the framework, the government is essentially extracting and providing a resource to the (domestic) private sector, so as to enable economic growth; and to enable data to be used within government for policy making and governance purposes.
That the most widely available and used datasets are those that are commercially lucrative, not the most socially valuable, is a problem even for traditional open government or open data initiatives. But the Indian government’s new policies seem to double down on the perspective of “data as oil”, treating it as a tradeable commodity.
The framework reduces the ability of citizens to exercise control over how their data is used or demand a return from the use of data. These issues have been foreshadowed: a government-appointed committee recommended methods to ensure greater community control over their data through innovative methods, such as a stewardship-based model for governing non-personal data, which has not been adopted.
In addition, if datasets were used unethically, or caused harm, there is limited recourse that citizens could take. Considering India does not yet have a modern data protection law, this is a notable vulnerability. The policy seeks to empower the India Data Management Office to lay down standards to anonymise datasets, but there are major harms that could come from enabling easier government-to-government access to datasets that previously existed in silos that have not been properly accounted for.
The policy also has issues with its design and envisaged processes. For one, the India Data Management Office doesn’t have statutory backing or clarity on how it should be set up organisationally, or how it could ensure institutional independence from the government. Given it will function as an arbiter of who gets access to specific datasets, determining issues like this will be critical: this is the body that could pick winners and losers in the data ecosystem.
There is sincerity behind the new framework — the Ministry of Electronics and Information Technology has carried out multiple rounds of consultation on the policy. But as it stands, the new framework falls short on practical and substantive aspects.
There is no easy way to develop holistic data governance policies and implement fair frameworks that account for the disparities specific to India. Data governance frameworks today strive for fairness by zeroing in on key information management principles — how easily can data be found, accessed, reused? Is it interoperable? However, these principles don’t take into account relationships, power differentials and the historical conditions associated with the collection of data that impact ethical and socially responsible data use.
Open data policies could be anchored to more than procedural or usability principles. A practical manifestation of such frameworks could involve governments working with select communities, such as farmers or gig workers, to understand their context-specific needs, and then curating public data sets that can address those needs. These data sets could then be made available for a wider community of innovators, but on terms that are identified by localised data trusts or cooperatives.
Open data frameworks that push to release as much data as possible and leave the rest to market dynamics might be asking the wrong questions. Moving forward, a stronger approach might be one that uses these frameworks to curate data that ameliorates specific problems of communities, and ensures data is used in line with the needs, capacities and vulnerabilities of the community in question.