Data Scientist. Denis V. Kazakov
e-mail: dvkazakov @ gmail.com
(remove spaces on both sides of @)
- Language Researcher at AWATERA (June 2023 – present)
- Natural language processing
- Research in large language models (LLM), such as GPT and its analogs
- LLM study and comparative analysis
- Prompt engineering
- Using LLMs for language translation and editing
- Experiment design
- Data analysis and machine learning
- Patent search and patent applications
- Pet projects (2020 – May 2023):
- Training a neural network to distinguish between human translation and machine translation
- Predicting oil and gas production
- Recovering a function from a scanned plot
- Kaggle competition. Uplift modeling — predicting which customers will buy a product if and only if they receive an SMS, i.e. those who won't buy unless they receive an SMS.
- Statistical analysis of usage of phrases in Google Books
- Prototype of a recommender system
- Forecasting bank client churn
- Airbnb price prediction
- Image classification: uninfected cells and cells parasitized by malaria
- Testing the validity of Zipf's law
- Previous experience: translator and interpreter
- Python. Libraries: Áèáëèîòåêè: Pandas, NumPy, sklearn, hyperparameter optimization (Optuna, keras_tuner), xgboost, Keras/ TensorFlow, Matplotlib, SciPy, StatsModels, Jupyter Notebooks, etc.
- R. Packages: tidyverse, ggplot.
- Machine learning:
- Regression: linear, nonlinear (polynomial, spline), regularization (lasso, ridge)
- Classification: logistic regression, KNN, SVM
- Tree-based methods: random forest, boosting
- Deep learning (CNN, RNN)
- Cross-validation, bootstrap
- Data transformation (PCA, SVD)
- Ensembles, pipelines
- Hypothesis testing
- Parametric and non-parametric methods
- Multiple testing
- Time series
- SQL (complex queries, window functions, common table expressions)
- Linux (bash)
- Other: REST API
Higher education: Faculty of Physics, Moscow State University. Degree: diploma of higher education (equivalent of Master of Physics).
Data Science reskilling course. Tomsk State University.
- I was the first to complete training out of ~160 students
- Students learning faster presented their projects to other students to help them progress on the course. I presented four projects ouf of 16
- As a top student, I also checked other students' graduation projects (normally done by staff)
May 2023. Recommender Systems in Practice bootcapmp. Higher School of Economics/Magnit.
December 2022. Uplift Modeling course at Open Data Science (December 2022).
Stepic courses. All courses were completed with distinction, ranking among 1 to 6% top students. Certificates.
An Introduction to Statistical Learning with Applications in R..
- Python – Basics and Application
- Programming in Python
- Data Analysis in R
- Basic Programming in R
- Basics of Statistics, parts 1, 2 and 3
- Intro to Data Science and Machine Learning
- Interactive SQL Simulator
- Intro to Linux
R for Data Science.
- Introduction to Machine Learning.
Khan Academy courses
- Intro to SQL: Querying and managing data.