Mining Microdata:

economic opportunity and spatial mobility
in Britain, Canada and The United States, 1850-1911

Project history

Research Questions

In Democracy in America, de Tocqueville devoted an entire chapter to explaining “why the Americans are so restless in the midst of their prosperity”. The high economic and geographic mobility of nineteenth-century North Americans was widely remarked upon, and it was usually explained by the plentiful availability of land. A half-century later, historian Frederick Jackson Turner formalized thehypothesis that the availability of Western agricultural land created extremely high migration and provided economic opportunity for the surplus labor crowding the urban centers of the East, suppressing movements to organize labor.3 In both the United States and Canada, the impact of the frontier on economic mobility, social structure, politics, and differences between the two nations has been central to historical debates in the past century.28-34 Scholars in both countries have also debated when and how the frontier “closed,” and the social consequences of closure, including declining economic mobility and cross-border migration. Turner’s thesis lost favor in the twentieth century, as historians deemphasized the frontier and began stressing the importance of urban and industrial development.

In the mid-twentieth century, Canadian and U.S. historians began to use the manuscript censuses to trace social and geographic mobility in nineteenth-century communities. At first the studies focused on the rural frontier, and the results seemed consistent with the ideas of de Tocqueville and Turner. When the “new social historians” began to apply the same techniques to urban areas, they found similarly high rates of migration. The urban studies, however, suggested much less upward social mobility than did the frontier analyses. Thernstrom, for example, challenged Turner’s interpretation that migration was a safety valve. He disputed the contention that working-class emigrants from Eastern cities were drawn to better economic opportunity on the agricultural frontier. Rather, he maintained, those workers formed a “floating [labor force of] permanent transients.... buffeted about from city to city.” A migrating underclass continuously in search of work recalled the “tramping journeymen” who roamed the countryside of early industrial England. From the 1960s to the early 1980s, a dozen studies of Canadian and U.S. urban communities agreed that despite high migration, economic opportunity was sharply limited in nineteenthcentury North America, just as it was in Victorian Britain.

Not only did historians challenge the interpretation that nineteenth-century North America was characterized by exceptionally high economic mobility; in addition, social scientists disputed the thesis that geographic mobility was exceptionally high in the nineteenth century. Published crude census data on state and province of birth suggested that internal migration increased substantially over the course of the twentieth century. Twentieth-century social theorists consistently argued that ever-increasing residential mobility has had many adverse consequences, such as reduced family cohesion, social dislocation, disrupted schooling, and health impairment. Increasing residential mobility continues to be widely cited in North American policy debates, including discussion of child immunization law, child support enforcement, political polarization, taxes, and grandparent visitation.

In recent years, prevailing interpretations about nineteenth-century economic opportunity and trends in spatial mobility have again been challenged through analysis of newly available data sources. Ferrie and Long used individual-level national census data from the United States and England to trace long-run trends in economic opportunity and social mobility. Their results suggest that intergenerational occupational mobility was very high in the nineteenth century United States, and much higher than England at the same time. Like Turner, they argue that economic mobility in nineteenth-century America was tied to the availability of cheap land, which drove up wages. Long and Ferrie argue that economic mobility declined in the United States and increased in England between the mid-nineteenth and the latetwentieth centuries, so England now has greater mobility than the United States. Ferrie and others argue that American geographic mobility has declined dramatically over the course of the twentieth century, so the theories and policies predicated on ever-expanding mobility are based on a false premise.

The new evidence is potentially of enormous significance for our understanding of long-run social and economic change. It suggests that de Tocqueville and Turner were correct all along: nineteenth-century North America was exceptional and profoundly different from the Old World because of cheap land, high migration, and unprecedented economic opportunity. But it would be highly premature to endorse this conclusion. Long and Ferrie made use of extraordinary new data sources – microdota that were developed by the Mining Microdata research team—but their linkage methods were crude. There is evidence that their methods may have resulted in a high rate of false links, especially in the United States. Because false links would exaggerate both geographic and economic mobility, Long and Ferrie’s major findings could be largely a consequence of measurement error.

We propose to capitalize on novel machine-learning record linkage technology to exploit massive new data sources for comparative historical analysis of economic opportunity and spatial mobility. The long debate over international comparisons of economic opportunity and geographic mobility reveals that discussions of economic mobility must be comparative. Knowing, for example, that 19% of the children of unskilled laborers became professionals means little without comparison across time and space. Such comparison is essential for understanding how political, environmental, and cultural factors shape economic and geographic mobility.

Like Long and Ferrie, we will compare Britain with the United States so that we can place the levels and trends in mobility in comparative context. For Britain, we will be able to test Miles’ finding that intergenerational economic mobility grew between the mid-nineteenth century and the early twentieth century.80 Our project will also extend the comparison to both British and French Canada, which will give us leverage to understand how politics, culture, and environment affected mobility. Like the United States, Canada was a federal country that eventually spanned North America, with low population density and cheap land in the nineteenth century. The Canadian economic environment was similar to that in adjacent American states. Yet Canada inherited a mixture of French and British culture and institutions, which were reflected in both intra-national and international socio-economic differences.

Our research will begin with the two questions posed by Long and Ferrie. First, what were the relative levels of intergenerational economic and spatial mobility in nineteenth-century North America compared with Britain? Second, what was the relationship between occupational mobility and spatial mobility? If we can answer these questions definitively, our work will have profound implications for long-contested debates about social structure in Britain, Canada, and the United States.

Our analysis, however, will go beyond the basic issues raised by Long and Ferrie, in three key respects. First, we will add comparisons to both British and French Canada, which will help to disentangle the effects of institutional and cultural factors on mobility. Second, we will compare levels of mobility in the 1850-1880 period with those after 1880, allowing us to discern trends in mobility in each country. Third, we will assess how individual and community characteristics affected the odds of economic advancement and migration. These elements will yield a deeper and richer mobility analysis than has previously been attempted, and the results will have the potential to reshape our understanding of the social order on both sides of the Atlantic during a critical period of cultural and economic transformation. The need for consistent comparisons, like we propose, was made clear recently by the sociologists van Leeuwen and Maas who concluded that while international studies of social mobility were still “difficult to compare,” new historical databases like the North Atlantic Population Project promise a resolution to long debates.

Large-Scale Data Sources

This project is based on one of the largest microdata collections in existence, the North Atlantic Population Project (NAPP).89-91 The NAPP database includes complete enumerations of the populations of Britain, Canada, the United States, and several other countries between 1850 and 1911. The scale of the database is immense; we have already released detailed data describing over 100 million persons, and we expect the collection to double in size during the next three years. It is the largest microdata collection in the world that includes personal identifiers - such as name and address - for each individual enumerated. Because of this, it is the largest database in existence that can be used to analyze economic and geographic mobility.

The availability of complete microdata on entire national populations opens extraordinary new research opportunities. Historians have been linking census records since the 1930s, but the methods they have used yielded non-representative samples.92 The complete-count data mean that for the first time we can apply sophisticated machine-learning technology to the problem of historical record linkage. This yields representative linked census samples that follow individuals and families across census years to assess intergenerational mobility. Because we have access to information on the entire populations with full geographic information, we will also be able to systematically assess the impact of local context on economic opportunity and spatial mobility. Nothing like this has ever existed; we believe that our analysis of this massive microdata collection represents a new research paradigm that will reshape approaches to the study of social structure.

History of the Project

NAPP emerged from a meeting of the International Microdata Access group at the University of Ottawa in 1999.91 95 96 Lisa Dillon, Steven Ruggles, and Matthew Woollard each discovered that the others had made an agreement with the Church of Jesus Christ of Latter-Day Saints (LDS) to covert genealogical data from 1880 or 1881 into a resource for scholarly research. Over 5,000 LDS volunteers had devoted some 4.5 million hours over 18 years to digitally transcribe the entire censuses of Britain, Canada, and the United States into digital form. In Canada and the United States, the LDS was interested in creating a genealogical product on CD-ROM and sought academic collaborators to improve the data and speed up their release schedule; in Britain, the LDS provided a digital copy of the data to the U.K. Data Archive as part of the copyright agreement. In all, the three data collections provided information on about 90 million persons. We quickly realized that we had an extraordinary opportunity to combine these massive data sets and create an integrated cross-national database for the late-nineteenth century.

Over the past decade, our partnership has succeeded far beyond our expectations. We obtained funding to support the work from a wide range of sources. In Britain, the project has been supported by the Economic and Social Research Council, the Leverhulme Trust, the Wellcome Trust, the Joint Information Systems Committee, and the Essex University Research Promotion Fund. In Canada, funders include the Social Sciences and Humanities Research Council, the Harold Crabtree Foundation, the Church of Jesus Christ of Latter-Day Saints, Google, IBM, SHARCNET, Statistics Canada, the Ontario Ministry of Research and Innovation, and the Canadian Foundation for Innovation. Work in the United States has been supported by the National Science Foundation, the National Institute of Child Health and Human Development, and Sun Microsystems. By leveraging these disparate resources, we have gone far beyond our original goal of creating an integrated database for the three countries around 1880. We have also added data from a dozen more censuses from surrounding censuses years and other countries, and we have developed sophisticated record-linkage technology to tie them together, as described in the next section. The data are already having a substantial impact on research; they have generated 11 Ph.D. Dissertations, 89 books and articles, and 125 conference papers.

All funding to date has been for constructing public-use microdata, rather than for substantive research. Members of the Mining Microdata team have published 26 books and articles using the data, but have conducted no cross-national collaborative studies. The present proposal represents the first opportunity for partners from all three countries to collaborate on substantive cross-national research.