home

e-mail: dvkazakov @ gmail.com
(remove spaces on both sides of @)

Phone/WhatsApp: +7-916-909-7864

Telegram: @denis_v_kazakov

GitHub

Skype: denis.v.kazakov

photo

Ðóññêèé


Portfolio

Contents:


Natural language processing

Google Translate detected!

"Google Translate detected!" is the battle cry of translators seeing that a translation was done by a computer rather than a human translator (implying that the translation is poor and it is clearly visible).

The purpose of this study project was to train a neural network to tell the difference between human and machine translation.

Skills:



Top of page

Image recognition

Image classification: uninfected cells and cells parasitized by malaria

27,558 cell images with equal instances of parasitized and uninfected cells.

Project goal: train a neural network to differentiate between parasitized and uninfected cells.

Skills:

  • Keras/TensorFlow
  • Convolutional neural networks
  • Handling images with different sizes
  • Nonlinear network topology: use of residual connections to improve accuracy
  • Automatic optimization of network architecture parameters (number of layers, number of convolution filters in each layer, kernel size) using the Bayesian and HyperBand tuners.

Top of page

Reconstructing functions from scanned plots

Many industrial standards and building codes specify caculation procedures to design various structures. Some older (but still valid) documents provide plots (nomograms) for manual determination of various parameters. While formulas can be easily converted into code, manual determination of parameters is inaccurate and time consuming, especially in case of iterative processes.

Project purpose: train a neural network to reconstruct function values from plots.

Skills: convolutional neural networks.

Top of page

Time series

Predicting unconventional oil and gas production

Study project with two parts:

Skills:

Top of page

Regression

Airbnb price prediction

Skills:

Top of page

Uplift modeling

Kaggle competition

Uplift modeling — predicting which customers will buy a product if and only if they receive an SMS, i.e. those who won't buy unless they receive an SMS. Rank: 18th place out of 177 contestants.

Skills:


Top of page

Classification

Predicting bank client churn

Skils:

  • Data preparation: pandas, sklearn
  • Choice of metric (balanced accuracy, recall, ROC AUC)
  • Decision trees, random forest, gradient boosting, AdaBoost (sklearn)
  • Deep learning (keras).


Top of page

Statistics

Checking Zipf's law validity

According to Zipf's law, the most frequent word in a language or a large body of texts will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.

Project purpose: check Zipf's law validity on English and Russian texts as well as the Frequency Dictionary of the Russian language.

Skills:
  • Data analysis with Pandas
  • Feature transformation to enable linear regression
  • Linear regression (Statsmodels)
  • Python class definition
  • Natural language processing, frequency estimation


Top of page

Clustering

Clustering nations by several features

Study project: optimum clastering of nations by given features.

Skills:
  • Principal component analysis
  • Biplots
  • K-means clustering
  • Selecting the optimum number of clusters


Top of page

Python

Dictionary conversion

App for technical translators compiling their own glossaries.

Skills: Python.


Top of page