Dataset for "Three Essays on Hukou and Internal Migration in China"

Dataset

Description

Dataset for PhD thesis entitled "Three Essays on Hukou and Internal Migration in China."

Data Description: Annual Composite Hukou Index and Sub-Indices for 120 Chinese Cities (2014–2019).

What is the data?
This dataset contains annual composite Hukou indices and four sub-indices (investment, house purchase, talent program, and employment requirements) for 120 Chinese cities from 2014 to 2019. The indices quantify the strictness of local Hukou policies (household registration system) based on 14 criteria derived from policy documents. The data is structured as a panel dataset, enabling cross-sectional and time-series analyses.

For whom is this data relevant?
Researchers studying urbanization, migration policies, labor markets, and social inequality in China; policymakers evaluating Hukou reforms; and scholars analyzing the multifaceted impacts of institutional barriers on economic and social outcomes.

How was the data generated?
1) Policy Document Collection:
Sources: PKU Law Library (pkulaw.com), CNKI Law (law.cnki.net), hukouwang.com (national Hukou news), bendibao.com (city-level regulations), and local public security bureau websites.
Keywords: "Hukou," "Huji," "Luohu."
Total documents: 640 (50 in 2014, 155 in 2015, 197 in 2016, 130 in 2017, 71 in 2018, 37 in 2019).
2) Quantification Criteria:
Criteria: 14 criteria across four sub-indices (1.Total investment amount 2.Additional conditions for investment 3.Total tax payment amount 4.Additional conditions for tax payment 5.Total value (or area) of house purchased 6.Additional conditions for house purchase 7.Professional ranks and titles 8.Education background 9.Additional conditions to be eligible in the program 10.Nature of business and occupation 11.Length of employment 12.Wage level 13.Payment of social insurance 14.Additional conditions of employment), aligned with Zhang, Wang, and Lu (2019) but extended to annual frequency.
3)Method:
See document"Procedures_for_generating_the_annual_Hukou_index_2014-2019".

Code Availability
The MATLAB code used to generate the Hukou indices, including data preprocessing, min-max normalization, and the PPR-PSO algorithm implementation, is available for academic reuse. Researchers interested in replicating the analysis or adapting the methodology are welcome to request the code by contacting the creator of the dataset.

What does the data include?
Variables:
Composite Hukou Index (range: 0–1; higher = stricter).
Four sub-indices: Investment, House Purchase, Talent Program, Employment.
Geographic Coverage: 120 Chinese cities.
Temporal Coverage: 2014–2019 (annual).
Titles, URL links, and further information on the policy documents.

Why is this data valuable?
Advantages over Zhang, Wang, and Lu (2019):
Temporal Extension: Adds 2017–2019 data to their 2014–2016 coverage.
Annual Frequency: Enables dynamic analysis of policy changes over time (vs. aggregated 2000–2013 and 2014–2016 indices).
Enhanced Policy Details: Includes hukouwang.com, bendibao.com, and local bureaus for granular policy details.

FAIR Compliance:
Findable: Clear documentation and structured metadata.
Accessible: Supplementary dataset provided in machine-readable format.
Interoperable: Aligned with prior literature (Zhang, Wang, and Lu, 2019) for cross-study comparisons.
Reusable: Code and methodology transparency facilitate replication and adaptation.

How to use the data?
Analytical Applications:
Panel regression to analyze Hukou policy impacts on urbanization, entrepreneurship, or talent retention.
Time-series analysis of policy trends post-2014 national Hukou reforms.
Cross-city comparisons of Hukou strictness by sub-index (e.g., investment vs. employment requirements).

Citation:
Zhang, J. P., Wang, R. and Lu, C. (2019) 'A quantitative analysis of Hukou reform in Chinese cities: 2000–2016', Growth and Change, 50(1), pp. 201-221.

This description adheres to FAIR principles by prioritizing clarity, transparency, and contextual relevance, enabling researchers to efficiently locate, understand, and reuse the data for diverse analytical purposes.

Please be aware that Hukou documents are written in Chinese (e.g. List_of_Hukou_documents_2014-2019.csv). Researchers have to be able to understand Chinese to locate and read these documents.

This dataset is provided for non-commercial purposes only. Users are strictly prohibited from reposting this data without proper attribution. Any improper use or redistribution is a violation of the author's rights.

Dataset embargoed until 1 April 2026.
Date made available23 Mar 2025
PublisherQueen's University Belfast
Temporal coverage2014 - 2019
Date of data production2021 - 2024
Geographical coverage120 cities within China, list in file "Annual Hukou index 2014-2019"

Cite this