|
e-mail: dvkazakov @ gmail.com (remove spaces on both sides of @)
|
Phone/WhatsApp: +7-916-909-7864
|
Telegram: @denis_v_kazakov
|
GitHub
|
Skype: denis.v.kazakov
|
|
Ðóññêèé
|
Study Projects
Pet projects completed before I got my first job in data science.
Contents:
Natural language processing
Machine translation with transformers
Set phrase extraction from corpora
I proposed my own method using normalized pointwise mutual information
Google Translate detected!
"Google Translate detected!" is the battle cry of translators seeing that a translation was done by a computer rather than a human translator (implying that this is obvious and the translation is poor).
The purpose of this study project was to train a neural network to tell the difference between human and machine translation.
Skills:
- Data preparation and analysis with Pandas
- Neural networks: model building, hyperparameter selection, evaluation of results (keras)
- Statistical hypothesis testing
- Power of statistical tests
- Languages: Python, R
Top of page
Time series
Predicting unconventional oil and gas production
Study project with two parts:
- Selecting an equation to describe production decline
- Predicting production
Skills:
- Exploratory data analysis
- Visualization
- Data preparation (Pandas)
- Feature transformation
- Linear regression
- Curve fitting (Scypy)
- Neural networks (keras, functional API)
- Time series
- Metric selection in line with the industry requirements
- Domain knowledge (shale reserve development)
Top of page
Multivariate regression
Multivariate regression when there are more targets than predictors
Skills:
- Gaining domain knowledge (laboratory analysis of rock samples)
- Exploratory data analysis
- Visualization
- Data transformation
- Industry-specific metric selection
- Object-oriented programming
- Regression analysis:
- Linear regression
- Regularization
- K nearest neighbors
- Random forest
- Boosting (AdaBoost)
- Hyperparameter optimization (Optuna)
Top of page
Uplift modeling
Uplift modeling — predicting which customers will buy a product if and only if they receive an SMS, i.e. those who won't buy unless they receive an SMS.
Kaggle competition. Rank: 18th place out of 177 contestants.
Skills:
- Data preparation and analysis (pandas)
- Feature selection
- Uplift modeling (sklift library)
- Gradient boosting (xgboost)
Top of page
Statistics
Checking Zipf's law validity
According to Zipf's law, the most frequent word in a language or a large body of texts will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.
Project purpose: check Zipf's law validity on English and Russian texts as well as the Frequency Dictionary of the Russian language.
Skills:
- Data analysis with Pandas
- Feature transformation to enable linear regression
- Linear regression (Statsmodels)
- Python class definition
- Natural language processing, frequency estimation
Top of page
Python
Dictionary conversion
App for technical translators compiling their own glossaries.
Skills: Python.
Top of page