Surveys and the City: Three Challenges to Quality Data Collection in Urban Areas
Development economists use household survey data to measure living standards across the world, but reliable data for cities in the global south is hard to come by. This article addresses three challenges to collecting data in cities – measurement, missing people, and money – as well as steps that can be taken in the design, implementation, and analysis of survey data to try and address them.
Millions of people move into cities in developing countries every month. What kind of a life awaits them there? The truth is that we don’t really know. Development economists use household survey data to measure living standards across the world, but reliable data for cities in the global south is hard to come by. In this article, I talk about three challenges to collecting data in cities – measurement, missing people, and money – as well as steps that can be taken in the design, implementation, and analysis of survey data to try and address them.
Surveys are incredibly powerful: properly designed, they can tell us about a huge population just by looking at a small randomly selected subset of the people that live there. However, they can only tell us about the population that the survey is designed to capture. Most surveys don’t collect data that is representative for cities: they are designed to tell us about differences in living conditions across countries and regions, or in urban versus rural areas, but few allow us to identify differences between urban areas or within cities.
This matters because it is very hard to know how living conditions vary across and within different cities in a single country. To answer these questions we’d need to adjust the sampling strategy and interview many more people in cities. This is expensive: on average, living standard surveys cost 170 USD per household interviewed. It can cost hundreds of thousands of dollars to capture of representative sample for a city. Moreover, this data is not easily compared across countries as there is no internationally accepted definition of ‘urban’. In practice, what counts as a city differs from one country to another, shaped by unique historical and political developments.
“Moreover, this data is not easily compared across countries as there is no internationally accepted definition of ‘urban’. In practice, what counts as a city differs from one country to another, shaped by unique historical and political developments”
Small-area estimation (SAE) provides a partial solution to this problem. This approach brings together in-depth information from a survey with more limited data that is representative at the city level, such as census data. Put simply, it takes household characteristics that are captured in both data sets and models the relationship between these characteristics and other outcomes only found in the survey, such as poverty rates. The parameters are then used to predict poverty levels of households in the census.
SAE has many uses and has shed new light on poverty and access to basic services in secondary cities in developing countries. Yet it does not resolve everything. Even if we use a population density cut-off to identify urban areas (which, for economists, is a key distinction between urban and rural), the underlying ‘area’ boundaries may be heterogenous. As such, we cannot be sure that we are comparing like with like. Moreover, SAE still depends on reliable up to date data for the urban areas, such as a census, which leads into the second challenge: accounting for missing people.
Surveys in cities may be particularly prone to ‘missing’ certain people due to two main problems, explained below in more detail: inaccurate sample frames and non-random biases in who agrees to participate. This is worrying, because it may be the most vulnerable people in cities that are hardest to reach. These people can include single people that work all-day, tenants that share cramped conditions with other families, and households that live in dangerous neighbourhoods. It is also important to note that only people with a fixed address will be included (surveying homeless people requires entirely different methods).
All surveys need a sampling frame – a list of the population from which you can draw the random sample to be surveyed. This is often the national census database. Yet it is not uncommon for census to be several years old. In contexts where cities are growing very quickly, large areas may be left out of the frame. In other contexts, certain areas may be left out deliberately – for example, if it is politically expedient to ignore informal areas with high levels of migrant foreign workers.
Moreover, surveys often rely on ‘two-stage’ sampling. First, areas are randomly selected from the sample frame and the number of households living in each area is verified (known as ‘listing’). Second, a set number of households in each area are randomly selected for surveying. Certain groups can be overlooked in this process. For example, in many developing countries tenants rent one or two rooms in another family’s home. Since these people do not eat with the owner of the house, they are not counted as part of the household. Yet they may not be recorded as a separate household.
Even among households that are selected to be surveyed, some people are much less likely to participate in the survey. Since surveys take place during the day and at home, it can be hard to reach people that work all day. People that live in areas where either crime or fear of crime is high may be wary to talk to strangers; and perhaps rightly so, as criminals have been known to pose as interviewers to conduct robberies. Moreover, interviewers may pretend that people refused to answer in order to spend less time in risky neighbourhoods.
Steps can be taken to address these challenges. Satellite imagery can be used as a check on the sampling frame. In the World Bank’s MLSC survey – a survey of living standards piloted in Durban (South Africa) and Dar es Salaam (Tanzania) – satellite data was even used to identify areas with more ‘irregular’ settlement patterns. This was then built into the sampling strategy so that data can be used to compare across visibly ‘slum like’ and ‘non-slum like’ areas. The MLSC team also got interviewers to check if there were renters in the houses at the listing stage so they were included in the full list of households used in the second sampling stage. Efforts were also taken to inform the local population ahead of the survey and to monitor patterns in the data being collected in real-time.
Life in cities is highly commoditised. A number of people have argued that this means that current practices in measuring poverty may be misleading for cities. This is partly an issue of how poverty measures are constructed – including which expenditures are considered ‘basic needs’ and the price poverty lines are set at – but it also has implications for way we collect data used in these measures.
People in cities tend to consume a wider range of goods, eat more food outside of the home, and face smoother market prices than in rural areas. This may make some methods of collecting data on consumption more appropriate for cities than others. Indeed, research shows that small differences in the design of consumption modules – such as who answers the questions and the time-frame they are asked to report consumption for – can lead to dramatically different poverty headcount and inequality rates.
Second, it is notoriously difficult to ask people about the value of goods. Take housing. This is a huge expense for most people. How can we find out how much it costs? Surveys often ask homeowners to record the value if, instead of owning your home, you had to rent it. But people may not have a good idea of rental values. They may also overestimate the value of their own house simply because it belongs to them (in psychology this is called the ‘endowment effect’). In the MLSC survey we modified the question to: If a friend of yours wanted to buy a property like this in the same neighborhood, how much would he/she have to pay?
Surveys remain an important tool to understanding living conditions in urban areas, but there is considerable scope to refine approaches to collecting data in cities. After all, the world is already more than 50 percent urban and the total number of people living in cities is set to double by 2050.
Dr Alexandra Panman is a Lecturer in Urban Economic Public Policy at The Bartlett Development Planning Unit (UCL), having recently earned her DPhil from the University of Oxford’s Department of International Development. Dr Panman worked as a consultant in the team that developed the MLSC surveys under the World Bank Spatial Development of Cities Program. Thanks to Nancy Lozano-Gracia (Senior Economist, World Bank) for comments on this article.