data150

View the Project on GitHub rhrishik02/data150

Poverty Mapping in Data-Deprived Regions

Poverty Overview

Poverty is arguably the most important determinant of development and its alleviation is one of the most sought after processes in the modern world. As of 2021, approximately 700 million people or 9.82% of the world live below the global poverty line and extreme poverty is exceedingly concentrated in sub-saharan Africa. In addition, poverty is a major factor of adverse health outcomes, child mortality, population growth, social instability, and conflict. The eradication of poverty is vital for the progression of the world through the development lens and is the first target of the Sustainable Development Goals. The root causes of poverty can be defined as lack of access to basic necessities like food, water, sanitation and shelter, but can also be expanded to include inequity ranging from gender and ethnic discrimation to poor governance, conflict, exploitation, and domestic violence. Importantly, poverty is exacerbated in areas of political and economic fragility like countries in Africa where children and communities face higher rates of poverty due to political upheaval, past or present conflict, corrupt leaders, and poor infrastructure that limits access to education, clean water, healthcare, and other necessities. Recently, research and literature have emphasized the idea of multidimensional poverty where poverty is not defined solely by income or specific economic output. Instead, multidimensional poverty contains indicators within three overarching facets of health, education, and standards of living that include nutrition, child mortality, years of schooling, school attendance, cooking fuel, sanitation, drinking water, electricity, housing, and assets.

In Africa, poverty is measured through life expectancy at birth, average school attendance period, expected school attendance and per capita income according to the United Nations Development program’s Human Development Index. Typically, poverty leads to extreme hunger in Africa as a quarter of the entire world’s malnourished population lives in Africa and over one fifth of the continent’s population is considered malnourished. More than 30% of African children experience growth disorders like stunting as a result of malnutrition. Further, the Sub-Saharan region in Africa has the highest infant mortality rate where one in eleven children die before their 5th birthday. Approximately 59 million children between the ages of 5 and 17 do not attend school and instead work where 20% of children are cheated into performing child labour, further contributing to the cyclic poverty crisis.

Metrics

With poverty rampant on both global and local scales, reliable measurement and analysis becomes increasingly important. Unfortunately, traditional approaches to measuring poverty rely on outdated census data and data analysis techniques that are especially unreliable in low/middle income countries as they tend to lack necessary statistical infrastructure. In recent years, scientists and researchers have turned to innovative approaches to measuring poverty and use newer-aged data science techniques to analyze indicators. The nature of identifiers and various measurements are still discussed among development experts within the research literature, ranging from construction of unidimensional and multidimensional indices to the use of monetary/nonmonetary metrics. Living standard indices regards the methods used to set appropriate thresholds (poverty lines) under which a person is defined as poor. Monetary-based metrics identify poverty as a shortfall in consumption and measure whether households or individuals fall above or below a defined poverty line, whereas asset-based indicators define household welfare based on asset ownership (e.g. refrigerator, radio or bicycle), dwelling characteristics, and access to basic services like clean water and electricity. These modern metrics attempt to capture the measurement both at specific points in time as well as chronic or transient poverty over time.

Resultantly, researchers have turned to quantifying poverty metrics in a digestible format that development experts can use to accurately pinpoint the location and nature of poverty within the local context of a given region. Poverty maps compile the aforementioned indicators into a geospatial dataset of disaggregated areas for a region. The advent of Big Data has revolutionized how data scientists can map poverty by taking advantage of mobile phone and satellite data to compile highly accurate plots. In addition, modern poverty maps are not only high in accuracy, but can be updated in real time as a result of the constant stream of data that they use as inputs. The statistical techniques used to compute outputs for the maps are advanced in nature and made possible by advanced computational software like supervised learning models that employ machine learning algorithms to train and test a set of data over thousands of iterations to define meaningful relationships between input data.

Data Sources

Typically, data scientists use geospatial data sets with features ranging from night-time lights, energy productivity of plants, topographic elevation and slope, climatic factors, type of land cover and presence/absence of roads, water features, human settlements and urban areas, protected areas, and locations of points of interest to facilities such as health centres and schools to find indicators of poverty. Interestingly, some data analysts have used CDR, or call detail records, to generate mobile phone metadata whose features range from metrics such as basic phone usage, top-up patterns, and social network, to metrics of user mobility and handset usage combined with raster and vector datasets to be processed into remote sensing/geographic (RS) covariates at a spatial resolution of 1 km^2. These data act as environmental and physical metrics that were likely to be associated with human welfare and included vegetation indices, night-time lights, climatic conditions, and distance to roads or major urban areas.

Methods

Some authors in the literature employ random forest-based population models to conduct their analysis. They describe RF’s as a non-parametric, nonlinear statistical method that falls within a category of machine-learning methods known as ‘ensemble methods’. Ensemble methods take individual decision trees that are considered ‘weak learners’ and combine them to create a ‘strong learner’. Others identifed a set of predictors that were most suitable for modelling the 3 datasets for poverty. To do this, they employed non-spatial generalized linear models that were trained on 80% of randomly selected data to prevent overfitting. Then, to generate the actual poverty mapping, they used the models from the previous step through hierarchical Bayesian geostatistical models (BGMs) to predict the poverty metrics. BGMs were chosen as they offer several advantages including straightforwardly imputing missing data and estimating uncertainty in the predictions as a distribution around each estimate.

In Africa, waves of the High Frequency Survey (HFS) were employed to accurately identify and map poverty in various regions to overcome the lack of statistical infrastructure. The High Frequency Survey has been implemented in countries such as Somalia and Sudan and provides innovative methodologies to map poverty in light of data that is difficult to obtain. First, the High Frequency Survey used geospatial techniques and high-resolution imagery to model the spatial population distribution, build a probability-based population sampling frame, and generate enumeration areas to overcome the lack of a recent population census. Second, the HFS adapted logistical arrangements, sampling strategy using micro-listing, and questionnaire design to limit time on the ground based on the Rapid Consumption methodology. Third, to estimate poverty in completely inaccessible areas, the HFS relies on correlates derived from satellite imagery and other geo-spatial data. Finally, special sampling strategies were used to measure the nature of the local nomadic population. For sampling strategy, the HFS employed a multi‐stage stratified random sample, ensuring a sample representative of all sub‐populations of interest where strata were defined among 2 primary dimensions of administrative location and population type, leading to 57 total strata. For urban and rural strata, population estimates were derived from the 2015 WorldPop data set that used data sources and methods including satellite imagery to create highly spatially disaggregated population estimates.

In Somalia, for example, accessibility for researchers to gather information on the ground was highly crucial. Many regions within the country were downright impossible to visit and obtaining information was dangerous due to the nature of the conflict within those areas. Thus, accessibility maps were compiled using key information from informant interviews with security experts and regional fieldwork coordinators based in the field. Publicly available information and incident reports provided by a local security company were used as auxiliary inputs. After compiling an accessibility map to collect data, regions like the entirety of Middle Juba as well as several other pre-war regions were excluded from testing as a result of their inaccessibility. Geospatial imagery was used to demarcate the inaccessible areas and incorporate the accessibility map into the sampling frame to only allow data to be drawn from accessible areas.

In Sudan, the escalation and expansion of the civil conflict posed serious challenges for the planning and implementation of field work and intensely interfered with the collection of data, with last known accurate estimates in 2009. Thus, the HFS in Sudan took advantage of technological and methodological innovations to overcome these challenges and establish a reliable system of data collection for obtaining valid poverty estimates. To do this, the HFS leveraged the expansion of cellular networks across Sudan to build a near real-time monitoring system where data could be constantly uploaded to a server and checked continuously. Computer Assisted Personal Interviewing was employed to conduct checks, thus eliminating the need for expensive and potentially dangerous revisits to areas for data collection. Adherence to the sample design was closely monitored by GPS, which tracked enumerators with mobile phone coverage. In addition, the actual questionnaire design was innovated, allowing for the reduction of the number of consumption items asked while still providing unbiased poverty estimates through within-survey imputation. This in turn saved time and allowed for the HFS to contain additional modules including asset ownership, education, labor market outcomes, perceptions of government performance and provision of public goods and services, psychological well‐being, perceptions of violence and safety, allowing a well‐rounded depiction of welfare and livelihoods.

Outside of the High Frequency Surveys, many researchers and other data scientists have used machine learning algorithms to ascertain a relationship between various geospatial inputs and poverty. Specifically, feature based predictive models, which employ quantifiable geospatial features such as the number of buildings, total area of buildings, lengths of roads, junctions, and number of schools/hospitals, can be trained and used to determine which features correlate with poverty levels. Additionally, image-based prediction models, which employ geospatial characteristics from aerial or satellite imagery, can be trained and used to recognize which qualitative characteristics such as types of buildings, shapes of roads, and man made structures can correlate with poverty levels. While both types of models have their advantages and disadvantages, efforts have been made to combine the effectiveness of both in tandem with other types of models and newer datasets stemming from mobile phone networks and the internet.

Complexity

Unfortunately, no matter how much data is input into the supervised learning models that compile poverty maps, it is not possible for them to be one hundred percent accurate at all times. As seen with Sudan, there are many times where data collection is near impossible as a result of political or economic instability within the local context. Thus, using broader data collection from geospatial inputs from satellite data and aerial imagery becomes the next best alternative. Furthermore, this alludes to the idea that poverty is not always linked to a specific agent like economic hardship or lack of output; poverty itself is linked to the suppression of freedoms that allow societies to achieve development. Whether that suppression is seen through authoritarian policy, lack of economic mobility, limited access to facilities for health and sanitation, or malnourishment, it is apparent that poverty is manifested when freedoms are limited. In light of this fact, scientific inquiries that can be made will be ones that have to do with contextualizing poverty to the specific region and people (malnutrition, discrimination etc.) rather than looking at single groups of indicators as absolute identifers of poverty.

Closing Remarks

In essence, poverty mapping can be achieved in numerous innovative ways as a result of the ability of supervised learning models to be trained using currently available geospatial data sets as well as other input factors and employing machine learning algorithms to find relationships between attributes for poverty indication. Whether this is seen on an international scale like with the High Frequency Surveys or through individual researchers conducting studies, poverty mapping is crucial for the implementation of readily available solutions that can alleviate poverty. Though the literature on the subject matter is vast, it is constantly being updated as a result of the advent of Big Data. However, a major gap in this literature is in areas that are excluded from data input streams of Big Data. In an increasingly interconnected world, areas that are not particularly connected are difficult to contextualize let alone analyze. As more and more emphasis is placed on Big Data, these areas become much more difficult in nature to analyze as a result of their lack of statistical infrastructure.A central research question that poverty mapping will seek to answer is “What variations of poverty alleviation can be produced as a result of these compilations of poverty maps?”

*Edits:

Title, Methods, Complexity

Works Cited:

Nieves Jeremiah J., Stevens Forrest R., Gaughan Andrea E., Linard Catherine, Sorichetta Alessandro, Hornby Graeme, Patel Nirav N. and Tatem Andrew J. 2017 “Examining the correlates and drivers of human population distributions across low- and middle-income countries” J. R. Soc. Interface.14

Steele Jessica E., Sundsøy Pål Roe, Pezzulo Carla, Alegana Victor A., Bird Tomas J., Blumenstock Joshua, Bjelland Johannes, Engø-Monsen Kenth, de Montjoye Yves-Alexandre, Iqbal Asif M., Hadiuzzaman Khandakar N., Lu Xin, Wetter Erik, Tatem Andrew J. and Bengtsson Linus 2017 “Mapping poverty using mobile phone and satellite data” J. R. Soc. Interface.142016069020160690

Pape, Utz, and Philip Wollburg. “Estimation Of Poverty In Somalia Using Innovative Methodologies”. 2019. World Bank, Washington, DC, doi:10.1596/1813-9450-8735

Pape, Utz, and Luca Parisotto. “Estimating Poverty In A Fragile Context: The High Frequency Survey In South Sudan”. 2019. World Bank, Washington, DC, doi:10.1596/1813-9450-8722. Accessed 10 Mar 2021.

Lee, Kamwoo, and Jeanine Braithwaite. “High-Resolution Poverty Maps In Sub-Saharan Africa”. Arxiv.Org, 2020, https://arxiv.org/abs/2009.00544.

“Global Poverty: Facts, Faqs, And How To Help World Vision”. World Vision, 2020, https://www.worldvision.org/sponsorship-news-stories/global-poverty-facts#:~:text=BACK%20TO%20QUESTIONS-,How%20many%20people%20live%20in%20poverty%20in%20the%20world%3F,according%20to%20the%20World%20Bank. Accessed 19 Mar 2021.
“The 2019 Global Multidimensional Poverty Index (MPI) Human Development Reports”. Hdr.Undp.Org, 2021, http://hdr.undp.org/en/2018-MPI. Accessed 19 Mar 2021.

“On The Poorest Continent, The Plight Of Children Is Dramatic”. SOS-US-EN, 2021, https://www.sos-usa.org/about-us/where-we-work/africa/poverty-in-africa. Accessed 19 Mar 2021.

Hummedia.Manchester.Ac.Uk, 2021, http://hummedia.manchester.ac.uk/institutes/methods-manchester/docs/povertymapping.pdf. Accessed 19 Mar 2021.