|
e-mail: dvkazakov @ gmail.com (remove spaces on both sides of @)
|
Phone/WhatsApp: +7-916-909-7864
|
Telegram: @denis_v_kazakov
|
GitHub
|
Skype: denis.v.kazakov
|
|
Ðóññêèé
|
Portfolio
Contents:
Natural language processing
Google Translate detected!
"Google Translate detected!" is the battle cry of translators seeing that a translation was done by a computer rather than a human translator (implying that the translation is poor and it is clearly visible).
The purpose of this study project was to train a neural network to tell the difference between human and machine translation.
Skills:
- Data preparation and analysis with Pandas
- Neural networks: model building, hyperparameter selection, evaluation of results (keras)
- Statistical hypothesis testing
- Power of statistical tests
- Languages: Python, R
Top of page
Image recognition
Image classification: uninfected cells and cells parasitized by malaria
27,558 cell images with equal instances of parasitized and uninfected cells.
Project goal: train a neural network to differentiate between parasitized and uninfected cells.
Skills:
- Keras/TensorFlow
- Convolutional neural networks
- Handling images with different sizes
- Nonlinear network topology: use of residual connections to improve accuracy
- Automatic optimization of network architecture parameters (number of layers, number of convolution filters in each layer, kernel size) using the Bayesian and HyperBand tuners.
|
|
Top of page
Reconstructing functions from scanned plots
Many industrial standards and building codes specify caculation procedures to design various structures. Some older (but still valid) documents provide plots (nomograms) for manual determination of various parameters. While formulas can be easily converted into code, manual determination of parameters is inaccurate and time consuming, especially in case of iterative processes.
Project purpose: train a neural network to reconstruct function values from plots.
Skills: convolutional neural networks.
|
|
Top of page
Time series
Predicting unconventional oil and gas production
Study project with two parts:
- Selecting an equation to describe production decline
- Predicting production
Skills:
- Exploratory data analysis
- Visualization
- Data preparation (Pandas)
- Feature transformation
- Linear regression
- Curve fitting (Scypy)
- Neural networks (keras, functional API)
- Time series
- Metric selection in line with the industry requirements
- Domain knowledge (shale reserve development)
Top of page
Regression
Airbnb price prediction
Skills:
- Data preparation and analysis with Pandas
- Principal component analysis
- Gradient boosting
- Sklearn pipelines
Top of page
Uplift modeling
Kaggle competition
Uplift modeling — predicting which customers will buy a product if and only if they receive an SMS, i.e. those who won't buy unless they receive an SMS. Rank: 18th place out of 177 contestants.
Skills:
- Data preparation and analysis (pandas)
- Feature selection
- Uplift modeling (sklift library)
- Gradient boosting (xgboost)
Top of page
Classification
Predicting bank client churn
Skils:
- Data preparation: pandas, sklearn
- Choice of metric (balanced accuracy, recall, ROC AUC)
- Decision trees, random forest, gradient boosting, AdaBoost (sklearn)
- Deep learning (keras).
Top of page
Statistics
Checking Zipf's law validity
According to Zipf's law, the most frequent word in a language or a large body of texts will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.
Project purpose: check Zipf's law validity on English and Russian texts as well as the Frequency Dictionary of the Russian language.
Skills:
- Data analysis with Pandas
- Feature transformation to enable linear regression
- Linear regression (Statsmodels)
- Python class definition
- Natural language processing, frequency estimation
Top of page
Clustering
Clustering nations by several features
Study project: optimum clastering of nations by given features.
Skills:
- Principal component analysis
- Biplots
- K-means clustering
- Selecting the optimum number of clusters
Top of page
Python
Dictionary conversion
App for technical translators compiling their own glossaries.
Skills: Python.
Top of page