Methodology

METHODOLOGICAL DOCUMENTS

This section provides explanations of the construction and estimation of each variable, the countries and surveys analyzed, and the period covered in the SEDLAC database.

The Methodological Guide describes the tables available in each section of the database and discusses the main methodological decisions taken to construct the variables.

Methodological Guide

FAQ

In this section you may find FAQ concerning the SEDLAC database, the methodology of estimation, and the countries and periods covered.

Click each question to see the corresponding answer. For more detail on the methodology, see the Methodological Guide.

What is the SEDLAC database?

SEDLAC is a database that includes statistics on poverty and other distributional and social variables from all Latin American and several Caribbean (LAC) countries. All statistics are computed from microdata from main household surveys carried out in these countries using a homogenous methodology (data permitting). Statistics are updated periodically.

SEDLAC allows users to monitor the trends in poverty and other distributional and social indicators in the region. The dataset is available as electronic Excel tables with information for each country/year.

SEDLAC is an ongoing project. All statistics made available on this site are permanently updated and revised. We are grateful to all comments and suggestions that help us improve the database.

How should information taken from this site be cited?

Information taken from this database should be cited as “Source: SEDLAC (CEDLAS and The World Bank)” or “Source: Socio-Economic Database for Latin America and the Caribbean (CEDLAS and The World Bank).” We suggest that researchers reference the date on which they consulted, as statistics are periodically updated.

What sections are currently included in the database?

The SEDLAC database is divided into 12 sections: household surveys, income, poverty, inequality, demographics, education, employment, housing, infrastructure, durable goods and services, aggregate welfare and pro-poor growth.

Each section contains an Excel file with multiple sheets. Each of these sheets shows a data table with specific information for each of all Latin American countries (or those for which data is available).

How comparable (between countries and over time) are the statistics in SEDLAC?

Household surveys are not uniform among countries of Latin America and the Caribbean. In particular, they differ significantly in their geographic coverage and questionnaires. There are also differences in the periodicity of surveys within a country.

This project seeks to ensure that statistics are comparable, as far as possible, between countries and over time. This is done using similar definitions of variables in each country/year and applying consistent methods of data processing. However, it is impossible to ensure a perfect comparability.

This webpage presents documentation to allow each user to decide whether or not to compare, considering the available information, and their specific needs.

Household surveys: which countries/years are covered by SEDLAC?

SEDLAC database includes information from over 300 household surveys carried out in 24 LAC countries: Argentina, Bahamas, Belize, Bolivia, Brazil, Colombia, Costa Rica, Chile, Dominican Republic, Ecuador, El Salvador, Guatemala, Guyana, Haiti, Honduras, Jamaica, Mexico, Nicaragua, Panama, Paraguay, Peru, Suriname, Uruguay and Venezuela.

For each period, the sample represents more than 97% of the total LAC population. The database mainly covers the 1990s, 2000s, and 2010s although we also present information for previous decades in a few countries.

Income: why is income used as a measure of welfare rather than consumption?

Even though we recognize that household consumption is often a better measure of welfare than household income, family income is used as the welfare proxy in this project because few countries in Latin America and the Caribbean routinely implement household surveys with consumption or expenditure questionnaires, while all countries include questions on individual and family income.While most countries have expenditure surveys, they are usually carried out in long intervals (in many cases, every 10 years), so they are not suitable for monitoring poverty, inequality and other relevant social indicators.

Income: how are income variables constructed in the SEDLAC database?

We construct individual income by adding all income sources together. Whenever possible we distinguish among income from salaried work, self-employment and salaries assigned to owners. Whenever possible we compute labor income from the main activity. Individual non-labor income is divided into three categories: (i) pensions; (ii) capital and benefits; and (iii) transfers. Countries ask different questions to capture data on capital income, interests, profits, rents and dividends. For comparison purposes, we prefer to gather all these questions into a single category. The same criterion applies to transfers, although we also construct a variable that identifies those transfers made by the government, and another that captures transfers clearly associated to poverty-alleviation programs.

Since we are interested in capturing current income, non-current items are not included in our definition of income. The same criterion leads to the exclusion of income from the sale of some goods and assets like vehicles, houses, or stocks. We also exclude income from gifts, life insurance, gambling and inheritances.

Once we have individual income, we construct household income by adding income for all members from the household. Household per capita income is computed as the ratio between total household income and the number of members in the household. Finally, we compute adjusted household income by several equivalence scales (see below for a specific FAQ addressing this issue).

Income: is the implicit rent from own-housing included in the calculation of income?

Yes, it is included. The concept of income considered in SEDLAC refers to the flow of resources obtained as remuneration towards the use of all the assets owned by an individual or household. According to this definition, income should include not only returns for the use of labor and capital, but also any other rents produced by the possession of durable goods, such as houses or cars.

Families that live in their own dwellings implicitly receive a flow of income equivalent to the market value of the service that the use of this property represents for them. This remuneration should be computed as part of household income, even though it is never recorded in a formal market.

In some surveys, owners are asked to estimate the rent they would have to pay if they had to rent the houses they occupy. The answer to this question is used to impute rents to own-housing, although issues of reliability in the answers are usually raised, in particular in areas where housing markets are not well developed.

In those surveys where this information is not available or is clearly unreliable, we increase household income of housing owners by 10%, a value that is consistent with estimates of implicit rents in the region.

Income: are adjustments made to consider non-response and missing incomes?

When household surveys are conducted, it is common to find that some of the individuals interviewed refuse to answer certain questions. Questions related to income are those that tend to have higher percentages of non-response.
If the decision not to answer income questions is related to the income level of individuals (for example, if the probability of non-response is higher for the rich), this non-response could generate a bias in the estimated statistics.

There are several methodological options to address this problem. The most used is the imputation of income to individuals who do not answer income questions. This can be done by using matching techniques or applying the estimated coefficients of a Mincer equation.

However, these methodological choices are not free of problems. Several decisions must be taken to implement the adjustment. The researcher should choose an estimation procedure, pick the dependent and independent variables, select a method for imputing error terms, and so on. Working with raw data has the advantage of greater transparency.

Due to these reasons, in SEDLAC we compute statistics with the official datasets, as it has been done in most academic and official studies.

Income: how are missing incomes treated when constructing the income variable in the project?

Suppose income from source s is missing for individual i. Should we record that individual’s total income as missing? If we do, should we in turn record the total income for individual i´s household as missing? We make the following (necessarily arbitrary) decisions:

If s is not the main source of income for i, then we compute the individual total income ignoring source s.

If instead s is the main source, we record total income as missing.

This alternative has the advantage of not dropping individuals from the datasets who do not respond questions on income sources of secondary importance. The disadvantage in this method is that income is under-estimated for these individuals.

Regarding household income, we record it as missing if the household head’s total income is missing. Otherwise, we compute household income assigning zero income to non-heads with missing income.

Income: is zero income considered in poverty and inequality estimations?

In many surveys, a non-negligible fraction of the working population reports zero income. This answer can be the consequence of different situations: (i) the individual in fact did not earn any income during the period covered by the survey, (ii) she earned only non-monetary income, which is not recorded in the survey, or (iii) she misreported income.

Household monthly income is used as a proxy for well-being. As argued above, one of the main caveats arises from the volatility of monthly income compared to well-being. The case of zero income is probably the most illustrative one. While people may have zero household income in one particular month, that may not be a good proxy for their well-being, insofar as her family can get (monetary or non-monetary) income from some sources (charity, transfers, savings, etc.). For that reason, zero income is a particularly important case of either misreporting or failure in the proxy for well-being.

In SEDLAC we compute the tables accepting zero income for poverty statistics and ignoring them for inequality indicators, as is mostly done in academic papers.

The differential treatment of zero income for poverty and inequality arises from the assumption that a zero household income would mostly arise from households where all members are unemployed, and/or from misreport by low-income people who forget or are not asked to report some income sources (e.g. charity, in-kind payments). Under this assumption zero income respondents should be considered poor.

However, some inequality measures collapse when considering zero income. Inequality indicators are scale invariant and then rely on proportional income differences. Accepting zero income implies dividing by zero, which generates computational problems. Given this fact, and the unreliability of zero household income, families who report zero income are usually ignored when computing inequality indicators.

Income: are unreliable or incoherent incomes identified?

Some income responses are clearly unreliable. Unreliable income may be the consequence of measurement errors or the deliberate misreport of income. Some National Statistical Offices (NSO) identify inconsistent answers in the dataset based on their expertise. When this occurs we accept these decisions. As in the case of missing income, we also accept the NSOs imputations replacing unreliable answers.

Income: is there any adjustment made for under-reporting in income variables?

Under-reporting can be the consequence of the deliberate decision of the respondent to misreport, the absence of questions that capture some income sources, or the difficulties in recalling or estimating income from certain sources (earnings from informal activities, in-kind payments, home production, capital income, etc.).

This problem likely implies a downward bias on the measured living standards of poor people, who rely on a combination of informal activities and/or production for own consumption, and of rich people who derive a larger proportion of income from non-labor sources and are probably more prone to under-report.

Differential misreporting behavior among respondents and differential efforts in the survey design can distort comparisons across countries. If these behaviors and efforts change over time they can also distort trends.

Researchers apply three kinds of strategies to alleviate these problems. The first one is restricting the analysis to more homogeneous variables that are less affected by problems of misreporting. Analysis can be limited to the distribution of labor income, or even more restricted, at the distribution of monetary wages from salaried work in urban areas. Of course, the disadvantage of this option is that one may ignore a sizeable part of the overall income distribution.

The second strategy is to apply a grossing-up procedure. Income from a given source in the household survey is adjusted to match the corresponding value in the National Accounts. This adjustment usually leads to inflating capital income relatively more than the other income sources. It relies on the dubious assumptions that data from national accounts is error-free.

Finally, a third strategy is to estimate under-reported incomes from other pieces of information in the survey. For example, Mincer regressions can be run to estimate wages for workers who clearly misreport wages but reliably report individual characteristics.

As in the case of non-responses, thus far we have computed statistics with the raw data, as is done in most academic and official studies. In the near future we will also produce a report on the robustness of some poverty and inequality measures to adjustments for non-response.

It is important to note that we compute some statistics for a wide range of variables, and some of these presumably have fewer problems with under-reporting (e.g. earnings for salaried formal urban workers). Users may restrict the comparisons to these variables if they are particularly worried about under-reporting.

Income: are income statistics presented in nominal or real terms?

Real rather than nominal incomes should be used in any distributional analysis. However, if prices faced by all households were the same, the distinction would be irrelevant. However, prices usually differ by location: if two households located in different regions have the same nominal income but encounter different prices they will experience different living standards.

Unfortunately, most countries in Latin America and the Caribbean do not routinely collect information on local prices as part of the household survey.

All countries have some regional price study, which does not completely solve the problem, since price dispersion may be high within a region, especially between urban and rural areas. More important, these studies substantially differ in methodology and results across countries.

In this database, all rural incomes are increased by a factor of 15% to capture differences in rural-urban prices. That value is an average of some available detailed studies of regional prices in the region. Although certainly arbitrary, we believe this alternative is better than (i) ignoring the problem of regional prices altogether, or (ii) using the available price information for each country, despite the enormous differences in methodology, scope, and results.

Another problem arises in those countries where the survey is carried out over the course of several months. If there is inflation, nominal incomes reported in different months should be deflated to make them comparable. In all countries where this happens, we use the official consumer price index to adjust nominal incomes.

Income: are adjustments made to consider equivalence scales in income variables?

Individuals usually live in households and share a common budget. This fact implies that an individual’s well-being depends on the resources available in the household and on the size, structure, and sharing rules within that household.

Probably, the most common indicator of individual well-being is household per capita income: household total income divided by the number of persons in the household. Although widely used, this variable ignores three relevant factors: (i) consumption economies of scale within the household that for instance allow a couple to live with less than double the budget of a person living alone, (ii) differences in needs among individuals, basically as a function of age and gender (these differences are behind the adjustments for adult equivalents), and (iii) unequal allocations of resources within the household.

Poverty estimates are computed using household per capita income, without considering these issues, as is the usual practice. Inequality indicators are computed for both household per capita income and for an adjusted household income variable, which considers points (i) and (ii) above. Point (iii) is left for potential exploration in future stages of the project, given the scarce data on this issue.

Poverty: which methodology is used to estimate poverty in the SEDLAC database?

Most of the poverty statistics shown in the SEDLAC database are on income poverty, defined as the inability to achieve a certain minimum income level, known as the poverty line (PL).
There are neither normative nor objective clear arguments to determine a set threshold under which everybody is poor and above which everyone is not poor. Since there is a fundamental arbitrariness in defining poverty, different authors and agencies use different poverty lines.

SEDLAC includes a set of poverty estimates based on (i) international poverty lines, and (ii) relative poverty lines (50% of median income). Using a range of lines is especially necessary given the arbitrariness in the definitions. While the measurement of poverty by a national line takes into consideration that societies differ in the criteria used to identify the poor, the international line is an unavoidable instruments necessary to compare absolute poverty levels and trends across countries, and provide regional and world poverty counts.

Poverty estimates are computed using household per capita income.

Poverty: which international poverty lines are used?

The USD 1 a day at PPP prices is an international poverty line that defines an international norm and it is applied to gauge an inability to pay for food.

The USD1-a-day line was proposed in Ravallion et al. (1991) and used in World Bank (1990). Originally, it was a value measured according to 1985 international prices and adjusted to local currency using purchasing power parities (PPP) to take into account local prices. The USD 1 standard was chosen as representative of the national poverty lines found among low-income countries. The line was recalculated in 1993 PPP terms at USD1.0763 a day (Chen and Ravallion, 2001). Later, this basic line was set at USD 1.25 a day at 2005 PPP (Ravallion, Chen and Sangraula, 2008), and USD 1.90 a day at 2011 PPP (Ferreira et al., 2016).

In this version of SEDLAC, poverty statistics are reported with three international lines: the recently proposed line of $ 1.9 a day at 2011 PPP, and lines traditionally used in the project of 2.5 and $ 4 a day per person at 2005 PPP.

The USD-2.5-a-day line, is similar to the median value of the extreme poverty lines officially set by LAC governments, while the USD-4-a-day line is close to the median value of the official moderate poverty lines. These values are multiplied by 30.42 to get a monthly poverty line.

We apply this line to a homogeneous definition of household per capita income across countries/years that includes all of the ordinary sources of income and estimated implicit rent from own-housing. Of course, since household surveys differ across countries, we may end up with non-strictly comparable variables even when we follow the same procedure.

Poverty: is the official methodology of each country used for poverty estimations?

No, there are no estimations of poverty using the official methodology of each country,Most Latin America and the Caribbean countries have national extreme poverty lines that are mostly based on the cost of a basic food bundle, and moderate poverty lines that are calculated from the extreme lines using the Engel/Orchansky ratio of food expenditures.

This methodology is also used by ECLAC, which in some cases helps governments to determine their national poverty lines. Despite some similarities, methodologies for national poverty estimates substantially differ across nations. Some countries use expenditures (e.g. Mexico), others use incomes (e.g. Argentina) and others a mix of income and expenditures (e.g. Bolivia).

In the file poverty_official_LAC.xls we present extreme and moderate poverty headcount ratios as reported by official sources in each country. We identify the source of information and report their published poverty statistics.

Poverty: what are relative poverty lines?

Some countries (e.g. those in the European Union) use a relative rather than an absolute measure of poverty. According to this view, since social perceptions of poverty change as the country develops and living standards go up, the poverty line should increase along with economic growth.

Probably the most popular relative poverty line is the one set at 50% of the median of the household per capita income distribution. As the economy grows, this line increases, and poverty is more likely to increase than with a fixed poverty line.

The project includes estimates of relative poverty, using household per capita income as welfare measure and the poverty line of 50% of the median of the distribution of that income as a measure of poverty.

Poverty: besides income poverty, is there any estimations on UBN indicators in the SEDLAC?

Income poverty measures have two important limitations. First, as monthly income is used as the welfare indicator, some people are incorrectly considered to be poor when they are in fact having a temporary negative shock, or are experiencing seasonal low income. Second, there are convincing arguments for considering poverty as a multidimensional issue. Insufficient income is just one of the manifestations of a more complex phenomenon.

These considerations led to a search for other variables that may be used to measure poverty. Researchers, agencies and National Statistics Offices have used different measures of housing, education, health, employment, and access to social services to define non-income or structural poverty. Given current practices in some countries and the availability of information in all Latin America and the Caribbean countries, we construct an indicator of poverty according to the following conditions:

(i) more than 4 persons per room
(ii) the household lives in “poor” places (e.g. street, shanty towns)
(iii) the dwelling is made of low-quality materials
(iv) the dwelling does not have access to water
(v) the dwelling does not have an hygienic restroom
(vi) there are children aged 7 to 11 not attending school
(vii) the household head has not completed primary school
(viii) the household head does not have a high-school degree, and there are more than 4 household members for each income earner.

All persons in a household are considered to be poor if they meet at least one of the above conditions. This indicator is similar to the popular UBN (Unsatisfied Basic Needs) indicator. We also combine this approach with that of income poverty by simultaneously using the UBN indicator with the USD 2.5-a-day poverty measure: only if an individual is poor under both criteria, is she considered “chronically” poor. We restrict the analysis to urban areas, since arguably the conditions for the UBN indicator should differ between urban and rural areas (e.g. access to sanitation).

Demographics: what does the dependency rate measure?

The dependency rate is an indicator of the economic sustainability of a household. It is calculated as household size over the total number of income earners in the household.

Education: what do the statistics on assortative mating measure?

The indicators of “assortative mating” try to capture the degree to which an individual characteristic (years of education, hourly wages or worked hours) is associated with those of the individual’s partner.

The statistics of “assortative mating” show linear correlation coefficients between couples in the mentioned variables.

Education: what is the difference between gross and net enrollment rates?

The gross enrollment rate is defined as the share of people in a given age group who are attending school, regardless of educational level.

The net enrollment rate is the share of people in a certain age group who are attending the educational level that corresponds to their age.

Education: how is the educational mobility indicator computed?

Statistics on educational mobility are computed following the methodology developed in Andersen (2001). The steps are as follows:

1. The dependent variable is the schooling gap, defined as the difference between (i) years of education that a child would have completed advancing one grade each year, and (ii) the actual years of education. In other words, the schooling gap measures years of missing education.
2. It is estimated to what degree family background (approximated by variables such as parental education, age of parents at birth of child, etc.) can explain the educational gap.
3. The Educational Mobility Index (EMI) is defined as 1 minus the proportion of the variance of the school gap that is explained by family background.

The lower the educational mobility, the more important family background is for explaining the education gap. Therefore, the closer that the value of the EMI is to 0, the lower educational mobility.

Employment: what criterion is used to define the type of firm in which a worker is employed?

Workers are classified into three groups according to whether they work in small firms, large firms or the public sector. To the extent permitted by information in the survey, small firms are those with fewer than five workers, while firms with five or more workers are classified as large. The public sector includes jobs in stated-owned firms, public schools, hospitals and other services, and public administration.

Employment: how are workers classified as formal or informal?

There are at least two different concepts that are referred by the term labor informality. The “productive” definition pictures informal workers as those in low-productivity, unskilled, marginal jobs, while the “legalistic” or “social protection” definition stresses the lack of labor protection and social security benefits. The productive definition is concerned with the type of job (e.g. salaried vs. self-employed, large vs. small firms), while the legalistic definition is concern with compliance of the labor relationship with some rules (mainly, labor protection).

The empirical implementation of the productive notion of informality has been linked to (i) the type of job (salaried, self-employment), (ii) the type of economic unit (small, large, public sector), (iii) and the worker’s skills. To implement this classification we consider all individuals without a tertiary or superior education as unskilled, and we define all firms with 5 or fewer employees as small. Given that an individual could have more than one job, we apply the classification only to his/her main occupation. We implement the following definition of labor informality:

Definition 1 (productive definition): An individual is considered an informal worker if (s)he belongs to any of the following categories: (i) unskilled self-employed, (ii) salaried worker in a small private firm, (iii) zero-income worker.

The social security benefit most asked about in Latin American and the Caribbean household surveys regards the right to receive a pension when retired. However, not all countries have questions on this item, and those that do phrase questions on this issue differently. For most countries, questions apply only to salaried workers, leaving out the self-employed. We implement the following legalistic/social-protection definition of informality:

Definition 2 (legalistic or social protection definition): A salaried worker is informal if s(he) does not have the right to a pension linked to employment when retired.

Housing: what is meant by "poor house"?

This variable captures whether the dwelling is located in a shanty-town or other clearly identifiable poor neighborhood, or if the individual/household lives in inconvenient places (e.g. the street). Definitions of a “poor house” vary widely across countries. We provide details in a methodological report.

Housing: how is the low-quality materials in housing statistic computed?

This variable records housing constructed with low quality materials for the walls, roof and floor. The materials used for housing differ significantly by country, and national surveys also cover this question to varying degrees. Materials that are a clear indicator of poverty in one country (or region) may not be related to poverty in other country. Comparisons based on these variables should be made with care and preferably only within countries. These indicators are very country-specific. The methodological report has more information on this issue.

Infrastructure: how is access to water defined?

Easy access to a safe source of water is one of the fundamental indicators of development. Most Latin America and the Caribbean surveys do not ask about potable water, but rather about the location of the water source. We construct a variable that takes the value 1 if the household has access to a source of water (safe water if recorded in the survey) in the house or lot.

Infrastructure: how is the availability of hygienic bathroom and sewerage drain defined?

In statistics that refer to access to hygienic restrooms, this variable takes the value 1 if the household has a restroom with a toilet connected to a sewerage system or to a septic tank.

A household is considered to have access to a sewer if it has a bathroom toilet and that toilet is connected to the sewer drain system.

Infrastructure: how is access to electricity defined?

A household is considered to have access to electricity if it has electricity in the home where family members live, whatever the source of that energy. This information is usually available in all countries, because household surveys typically include information regarding the type of lighting available in the home.

Infrastructure: does telephone access include both fixed and mobile phones?

Yes, both type of telephones are considered: telephone access is defined by the availability of a fixed phone in the dwelling or a mobile phone owned by at least one of the members of the household. It is important to note that most household surveys began to include questions about the use of mobile phones during the second half of the 1990s or early 2000s, so these statistics usually show a significant change at that moment.