Data
Eurostat is an abundant and reliable source of NUTS-2 level statistics, so it is a reasonable choice to use7. However, human development is a hardly available indicator. “The Human Development Index (HDI) is a summary measure of achievements in three key dimensions of human development: a long and healthy life, access to knowledge and a decent standard of living.8” The disadvantage of the HDI and its components defined by the United Nations Development Programme is that they are reported officially only on the national level. To ensure comparability with studies that used national-level HDI, I aim to obtain or calculate equivalent indicators.
Human Development
A possible source for this target is the database provided by the Global Data Lab. [Global Data Lab - Innovative Instruments for Turning Data into Knowledge] This website and the data are created by the Institute of Management Research at Radboud University. Their reported values for the national level are almost fully equivalent to the ones published by UNDP. Table 1 figures the similarity between the two datasets.
The reason why this dataset does not fully meet the requirements to use it in this research is that the territorial units do not completely fit with the ones reported by Eurostat, so merging the two data sources is only possible with a high rate of mismatching observations. Although I do not use these data for model building, the matching observations are useful as benchmark points for calculating the index based on the available data from the Eurostat database. This usability is confirmed by its similarity to the officially reported national-level dataset by UNDP.
Indicator | Value |
---|---|
\(R^2\) | 99.76% |
Spearman \(R^2\) | 99.71% |
Mean absolute deviation | 0.007 |
Mean absolute percentage deviation | 1.20% |
A decent standard of living
To determine the standard of living dimension of the development, the gross national income (GNI) per capita would be required in PPP terms (constant 2017 PPP$) according to the technical note of UNDP9. To transform it into an index, the income is put into the following equation:
\[\begin{align} \text { Income index }=\frac{\ln (\text{GNI})-\ln (100)}{\ln (75,000)-\ln (100)} \end{align}\]
This normalization process keeps the value between 0 (at 100 dollars annual income) and 1 (at 75,000 dollars annual income). Eurostat reports GNI in PPP terms (constant 2020 PPP) in euros for regional observations. To eliminate this difference, I divide the given GNI value by the corresponding annualized value of the EUR/USD exchange rate (similarly downloaded from the Eurostat database). Table 2 figures the similarity to the values reported by Global Data Lab for the matching observations (where the territorial unit corresponds with the one used by Eurostat). In this aspect, I would like to highlight the very high \(\text{Spearman R}^2\) between them. It shows that it hardly ever happens that a region has a different rank in the two datasets.
Indicator | Value |
---|---|
\(R^2\) | 92.20% |
Spearman \(R^2\) | 94.18% |
Mean absolute deviation | 0.0520 |
Mean absolute percentage deviation | 6.62% |
Long and healthy life
The health dimension of HDI is determined based on the life expectancy at birth. The logic of the variable transformation is the same as previously written at the income index:
\[\begin{align} \text { Health index }=\frac{\text{ life expectancy at birth }-20}{85-20} \end{align}\]
The natural minimum of life expectancy is at 20 years, while the maximum is estimated at 85 years, and with the given formula, the range of health index is also set between 0 and 1. Since life expectancy is reported for NUTS 2 levels by Eurostat, this transformation can be performed without any additional calculation.
Knowledge
Determining the education index is more complex compared to the previous ones. The required expected years of schooling are not available at a regional level. However, Eurostat reports several pieces of information on the topic of education. In this study, I substitute the original formula with an index based on the available variables. For this purpose, I use data about the population by educational attainment level and NUTS 2 regions. The available data are also grouped by age. Considering that the discussed decline in fertility rates is mainly related to women under the age of 3010, statistics about younger generations seem more relevant. To manage this, I only use the age class 20-24 and 25-34.
The educational dimension is based on the International Standard Classification of Education (ISCED 2011). This means the following categories: Less than primary, primary and lower secondary education belongs to levels 0-2, upper secondary and post-secondary non-tertiary education is labeled with levels 3 and 4 and tertiary education with levels 5-8.
These two dimensions lead to eight possible variables, and the target is to define one index from their values. To manage this dimensionality reduction, I use principal component analysis (PCA). PCA is a useful tool to identify independent sources of variance, and in many cases, economic interpretation can be found in the loadings11. For this step, I impute missing values based on auxiliary regressions using mice R package, and I normalize the variables. Figure 4 shows the result.
The next step is to find the relevant principle component or components. In addition to the interpretability, I also focus on how well this principle describes the education index reported by Global Data Lab. Table 3 contains the similarity indicators.
The next step is to find the relevant principle component or components. In addition to the interpretability, I also focus on how well this principle describes the education index reported by Global Data Lab. Table 3 contains the similarity indicators.
Indicator | Comp 1 | Comp 2 | Comp 3 | Comp 4 | Comp 5 | Comp 6 |
---|---|---|---|---|---|---|
\(R^2\) | 5.65% | 30.76% | 18.16% | 10.63% | 0.00% | 0.03% |
Spearman \(R^2\) | 2.66% | 26.09% | 17.66% | 9.38% | 0.57% | 0.78% |
Since the \(R^2\) and Spearman \(R^2\) are high between the second principal component and the education index from Global Data Lab, and the component has good interpretability, I use the second principal component in the following. Based on the loading, its value increases if the proportion of lower educated people increases and decrease if the share of highly educated people increases. This tells the opposite of what the education index means. For this purpose, I determine the education index as the normalized value of the given PCA score multiplied by minus one.
\[\begin{align} \text{Education index} = \frac{ -\text{PCA score} + |\text{min}\left(-\text{PCA score}\right)|}{\text{max}\left(-\text{PCA score} \right )} \end{align}\]
Total fertility rates, family benefit and unemployment statistics
Family benefit data can be found in the Eurostat database. It is reported on the national level, and in this paper, I choose its unit as a percentage of gross national income. The national-level family support corresponds to all regions of a country. The highly unbalanced distribution of family support within countries can lead to incorrect results. However, better statistics are not available at present. In this paper, I systematically refer to this indicator (expenditure on family/children benefits as a percentage of gross national income) as a family benefit or family support.
Youth unemployment rate and fertility statistics are reported for NUTS-2 levels by Eurostat. To perform this empirical analysis, the transformation of these variables is not required.
Technical note: To ensure easy reproducibility data was download with the dedicated Eurostat R package.↩︎
UNITED NATIONS DEVELOPMENT PROGRAMME (2020). Technical notes: Calculating the human development indices.↩︎
UNITED NATIONS DEVELOPMENT PROGRAMME (2020). Technical notes: Calculating the human development indices.↩︎
Society at a Glance 2019. OECD.↩︎
Maddala, G. S. and Lahiri, K. (1992). Introduction to econometrics, volume 2. Macmillan New York.↩︎