home

e-mail: dvkazakov @ gmail.com
(remove spaces on both sides of @)

Phone/WhatsApp: +7-916-909-7864

Telegram: @denis_v_kazakov

GitHub

Skype: denis.v.kazakov

photo

Ðóññêèé

home

e-mail: dvkazakov @ gmail.com
(remove spaces on both sides of @)

Phone/WhatsApp: +7-916-909-7864

Telegram: @denis_v_kazakov

GitHub

Skype: denis.v.kazakov

photo

Ðóññêèé

Study project
Clustering nations by several features

Skills:

[1] Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65.

Project notebooks: Data were taken from an earlier study project comparing religiousness with other parameters: life expectancy, corruption, democracy index, level of freedom (Freedoms in the world), GGP per capita, human development index (HDI), homicide rate and population size.

All data were obtained from Wikipedia into a single file (Raw_data.xlsx), preprocessed and saved in individual csv files (raw_data folder).

Some columns were renamed or deleted for this project (prepared_csvs folder).

Two options were considered: with and without data on population size as it is not a parameter directly affecting life quality.

Principal component analysis
With population size


There is clearly a single dominant component.





Sampe plot without data on countries for clarity:



A few features are almost on the first principal axis: HDI, GDP, corruption (higher index means lower corruption!), life expectancy and (in the opposite direction) religiousness.

Homicide rate and population size are more aligned with the second principal component.

The second plot without taking population size into account:



Clustering


Results (average silhouette width) are better when population size is not taken into account, so only show them for this approach below (and only for two, three and four clusters).







The best result is achieved with two clusters. It can also be seen that the data are split by the first principal component value.

Conclusion: if we look at these features only, the modern world is a continuum without clearly separated groups.



Top of page