Blog
Responsible AI
April 8, 2022

Data Practices for Machine Learning in India : Towards building a bottom-up agenda for Responsible AI in India

Earlier this year, we hosted a workshop on Data Practices for Machine Learning. DFL researcher's Angelina and Harsh shared their findings on data practices in the health and agriculture sector, and this was followed by a open discussion among select experts.
Download PDF
Data Practices for Machine Learning in India : Towards building a bottom-up agenda for Responsible AI in India
illustration by:
XoMEoX, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons

On August 13th, Digital Futures Lab organised a  Workshop on Responsible Data Practices in India. It focused on finding and developing recommendations for responsible data practices in healthcare and agriculture in the Indian context. It was a closed-door workshop with participants from civil society, academia and Industry specialising in Artificial Intelligence and associated Data Practices. 


The workshop opened with a presentation by Angelina Chamuah and Harsh Bajpai, sharing the research findings on responsible data practices in AI/ML across the healthcare and agriculture sector in India. The presentation was followed by a structured discussion between various participants on the data ecosystem for AI in India and the need for responsible data practices. 


Below is a summary of the participants’ discussion:


Dr. Avik Sarkar started the discussion by outlining the granularity with which the government collects data in agriculture. He outlined the various types of data that is needed - production yield data, crop data, soil data, weather data and more. He noted that AI is primarily used for precision agriculture to understand the precise location of the fields, type of soils and provide information to stakeholders in the ecosystem. It requires getting very high-level contextual information. For e.g., weather data collected from satellite imagery needs to be linked to soil data and crop data in order to come up with a better prediction. He further noted the government understands that data collection and monitoring data is a costly exercise. 


Dr. Avik suggested that adequate stakeholders will come into the market and develop financial business models once the centre comes up with an adequate marketplace model. He also noted the importance of India’s Personal Data Protection (PDP) Bill to ensure that data is handled responsibly, and raised questions regarding creating sectoral data protection laws and whether it is feasible to do so without the PDP in place. Further, Dr. Avik also noted that in the agricultural sector, the issue of privacy is not of the utmost importance, as most of the data collected does not pertain to individuals but geographical attributes. 


However, Subash, from ICAR, noted that privacy concerns do exist in the agriculture sector as well. For instance, performing agricultural surveys, Subash noted that people are increasingly wary of sharing their data. Even with consent mechanisms in place, and researchers explaining the purpose for data collection, trust is becoming a key issue.
 

Dr. Ranjit, a postdoc scholar, noted the importance of examining data practices through every data lifecycle stage in the AI/ML development process. He stated that it can be an organising principle for thinking about questions on data practices. Since data collection is at the top of every other process, it changes the dynamics of collecting data. It opens questions like whether people are trying to obfuscate some of the data by themselves, instead of people just entering data. Who are these human actors involved in every part of the process, say data collection and data annotation? How do AI developers, policy ecosystems, perceive these actors? 


Dr. Ranjit highlights that answering the said questions changes how information is designed, how the user interface is designed for the systems and where intervention needs actually to be placed. Ranjit’s points also reflect the findings from our research in terms of the kinds of unpaid human actors driving data collection practices and the change in relationships due to these practices. Dr. Urvashi here pointed out that India is in a digitalisation space driven by human labour. Thereby, the responsible data practices question is more of a labour question than a fairness one. To Urvashi’s point, Ranjit suggested creating a visual representation around the different stages of human labour at various stages of the life-cycle that can be beneficial to policymakers. 


Dr. Venkata Pingali further led the discussion of data collection and fairness to human labour. He stated that these issues should not be typified merely as technology or legal issues like robust frameworks for consent or data sharing. This is because most of the small and big decisions being taken on a daily basis by individuals in the ecosystem are not even about machine learning. The decisions are much more human-centric, for e.g., someone’s decision pertaining to data quality might deny a covid bed in a hospital. Dr. Pingali asserts that no legal contracting structures enable an individual to make the right decisions, even with the best intent.Therefore, the said issues should be typified as cultural issues. It is a sense of responsibility within the human actors that comes into play as to how their work impacts the lives of others. The question here is whether people in the entire data ecosystem, technical or non-technical, have internalised what it means to be in the data world, whether they understand the responsibilities, whether they have the values to draw the appropriate lines. He further stated that the use and misuse of data happens wherever decisions are involved. Better decision-making requires us to focus on the education of those who are making those decisions. 

Srinivas, an independent researcher, viewed data collection activity from an economic lens of demand and supply. He stated that the marketplace has become too complex where both public and private entities are trying to understand and meet the supply chain. On the supply side, players understand who are the producers of data, what kind of data they are producing, and what practices can be automated. On the demand side, players understand who the consumer is, what it is consuming and what the consumer can demand in the future to ensure distribution. The government, through the marketplaces, wants to create more economic opportunities, and therefore it is either creating or regulating marketplaces. In order to understand the supply chain ecosystem, there needs to be a conceptual difference between ‘Automation for decision making’ (demand-focused) and ‘AI for automation’ (supply focused).



Dr. Shivangi, who has worked on data in the policing sector, recounted potential similarities to the health sector and posed interesting questions. How do the stakeholders in play think about the long-term challenges that can happen during treatment of an illness? How can certain kinds of relationships and correlations that data creates be sold? How will the data be analysed? These questions need to be examined within institutional practices and the political climate within which they take birth. Dr. Pingali added herein that “problems are bound to emerge when ignorance and bias are combined with a capable tool in hand”.


Subash also highlighted specific institutional challenges during the data collection process. In agriculture, he highlighted that most of the digitisation process is outsourced to an organisation that is either overloaded with data or is not in their mandate to collect data, e.g., KVKs. The incentive which data collectors get, whether it is ASHA workers in healthcare or teachers or students at Agri University, is also low. It leads policymakers to think about how the outsourced organisations can perform better practices by forgetting that it is not their mandate.


Dr. Pingali herein identified that offloading of the collection process should not happen as the most valuable part is understanding the entire process and the tacit knowledge that data collectors like ASHA workers possess. The value is not in the metrics or parameters that ASHA workers are collecting. Instead it is in their observations, the value is in their presence and the contextual information that they can gather. Making them beneficiaries of the outcome, or helping them do their job better and getting paid can make a difference in understanding the data and decisions made. 


Srinivas ended the discussion with a quote from the book Food Routes, explaining the reasons behind having no data standards:


Equipment manufacturers regard their data gathering methodology as proprietary and traditional food manufacturing cultures reinforce the notion that supply and supply chains should not be transparent. Since transparency may expose competitive advantages. There is a movement within the industry that supports open systems, but those who support a proprietary model think data is valuable and needs to be regarded as a fungible asset.


Srinivas further expanded on the reasons behind the concentration of market power. He states that the Indian government is essentially building open networks and not open source systems. They are building open networks where business networks can connect to each other to form a marketplace. Not everybody has access to it, even though it states ‘OpenAPI’. Therefore, when it comes to decision-making one needs to examine who is deciding and for whom.


Browse categories

Scroll right
XoMEoX, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons
illustration by:
XoMEoX, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons

Data Practices for Machine Learning in India : Towards building a bottom-up agenda for Responsible AI in India

Earlier this year, we hosted a workshop on Data Practices for Machine Learning. DFL researcher's Angelina and Harsh shared their findings on data practices in the health and agriculture sector, and this was followed by a open discussion among select experts.

On August 13th, Digital Futures Lab organised a  Workshop on Responsible Data Practices in India. It focused on finding and developing recommendations for responsible data practices in healthcare and agriculture in the Indian context. It was a closed-door workshop with participants from civil society, academia and Industry specialising in Artificial Intelligence and associated Data Practices. 


The workshop opened with a presentation by Angelina Chamuah and Harsh Bajpai, sharing the research findings on responsible data practices in AI/ML across the healthcare and agriculture sector in India. The presentation was followed by a structured discussion between various participants on the data ecosystem for AI in India and the need for responsible data practices. 


Below is a summary of the participants’ discussion:


Dr. Avik Sarkar started the discussion by outlining the granularity with which the government collects data in agriculture. He outlined the various types of data that is needed - production yield data, crop data, soil data, weather data and more. He noted that AI is primarily used for precision agriculture to understand the precise location of the fields, type of soils and provide information to stakeholders in the ecosystem. It requires getting very high-level contextual information. For e.g., weather data collected from satellite imagery needs to be linked to soil data and crop data in order to come up with a better prediction. He further noted the government understands that data collection and monitoring data is a costly exercise. 


Dr. Avik suggested that adequate stakeholders will come into the market and develop financial business models once the centre comes up with an adequate marketplace model. He also noted the importance of India’s Personal Data Protection (PDP) Bill to ensure that data is handled responsibly, and raised questions regarding creating sectoral data protection laws and whether it is feasible to do so without the PDP in place. Further, Dr. Avik also noted that in the agricultural sector, the issue of privacy is not of the utmost importance, as most of the data collected does not pertain to individuals but geographical attributes. 


However, Subash, from ICAR, noted that privacy concerns do exist in the agriculture sector as well. For instance, performing agricultural surveys, Subash noted that people are increasingly wary of sharing their data. Even with consent mechanisms in place, and researchers explaining the purpose for data collection, trust is becoming a key issue.
 

Dr. Ranjit, a postdoc scholar, noted the importance of examining data practices through every data lifecycle stage in the AI/ML development process. He stated that it can be an organising principle for thinking about questions on data practices. Since data collection is at the top of every other process, it changes the dynamics of collecting data. It opens questions like whether people are trying to obfuscate some of the data by themselves, instead of people just entering data. Who are these human actors involved in every part of the process, say data collection and data annotation? How do AI developers, policy ecosystems, perceive these actors? 


Dr. Ranjit highlights that answering the said questions changes how information is designed, how the user interface is designed for the systems and where intervention needs actually to be placed. Ranjit’s points also reflect the findings from our research in terms of the kinds of unpaid human actors driving data collection practices and the change in relationships due to these practices. Dr. Urvashi here pointed out that India is in a digitalisation space driven by human labour. Thereby, the responsible data practices question is more of a labour question than a fairness one. To Urvashi’s point, Ranjit suggested creating a visual representation around the different stages of human labour at various stages of the life-cycle that can be beneficial to policymakers. 


Dr. Venkata Pingali further led the discussion of data collection and fairness to human labour. He stated that these issues should not be typified merely as technology or legal issues like robust frameworks for consent or data sharing. This is because most of the small and big decisions being taken on a daily basis by individuals in the ecosystem are not even about machine learning. The decisions are much more human-centric, for e.g., someone’s decision pertaining to data quality might deny a covid bed in a hospital. Dr. Pingali asserts that no legal contracting structures enable an individual to make the right decisions, even with the best intent.Therefore, the said issues should be typified as cultural issues. It is a sense of responsibility within the human actors that comes into play as to how their work impacts the lives of others. The question here is whether people in the entire data ecosystem, technical or non-technical, have internalised what it means to be in the data world, whether they understand the responsibilities, whether they have the values to draw the appropriate lines. He further stated that the use and misuse of data happens wherever decisions are involved. Better decision-making requires us to focus on the education of those who are making those decisions. 

Srinivas, an independent researcher, viewed data collection activity from an economic lens of demand and supply. He stated that the marketplace has become too complex where both public and private entities are trying to understand and meet the supply chain. On the supply side, players understand who are the producers of data, what kind of data they are producing, and what practices can be automated. On the demand side, players understand who the consumer is, what it is consuming and what the consumer can demand in the future to ensure distribution. The government, through the marketplaces, wants to create more economic opportunities, and therefore it is either creating or regulating marketplaces. In order to understand the supply chain ecosystem, there needs to be a conceptual difference between ‘Automation for decision making’ (demand-focused) and ‘AI for automation’ (supply focused).



Dr. Shivangi, who has worked on data in the policing sector, recounted potential similarities to the health sector and posed interesting questions. How do the stakeholders in play think about the long-term challenges that can happen during treatment of an illness? How can certain kinds of relationships and correlations that data creates be sold? How will the data be analysed? These questions need to be examined within institutional practices and the political climate within which they take birth. Dr. Pingali added herein that “problems are bound to emerge when ignorance and bias are combined with a capable tool in hand”.


Subash also highlighted specific institutional challenges during the data collection process. In agriculture, he highlighted that most of the digitisation process is outsourced to an organisation that is either overloaded with data or is not in their mandate to collect data, e.g., KVKs. The incentive which data collectors get, whether it is ASHA workers in healthcare or teachers or students at Agri University, is also low. It leads policymakers to think about how the outsourced organisations can perform better practices by forgetting that it is not their mandate.


Dr. Pingali herein identified that offloading of the collection process should not happen as the most valuable part is understanding the entire process and the tacit knowledge that data collectors like ASHA workers possess. The value is not in the metrics or parameters that ASHA workers are collecting. Instead it is in their observations, the value is in their presence and the contextual information that they can gather. Making them beneficiaries of the outcome, or helping them do their job better and getting paid can make a difference in understanding the data and decisions made. 


Srinivas ended the discussion with a quote from the book Food Routes, explaining the reasons behind having no data standards:


Equipment manufacturers regard their data gathering methodology as proprietary and traditional food manufacturing cultures reinforce the notion that supply and supply chains should not be transparent. Since transparency may expose competitive advantages. There is a movement within the industry that supports open systems, but those who support a proprietary model think data is valuable and needs to be regarded as a fungible asset.


Srinivas further expanded on the reasons behind the concentration of market power. He states that the Indian government is essentially building open networks and not open source systems. They are building open networks where business networks can connect to each other to form a marketplace. Not everybody has access to it, even though it states ‘OpenAPI’. Therefore, when it comes to decision-making one needs to examine who is deciding and for whom.


Browse categories

Scroll right