home

e-mail: dvkazakov @ gmail.com
(remove spaces on both sides of @)

Phone/WhatsApp: +7-916-909-7864

Telegram: @denis_v_kazakov

GitHub

Skype: denis.v.kazakov

photo

Русский


>

Multivariate regression when there are more targets than predictors

Mercury injection capillary pressure (MICP) experiment

This project at GitHub.

Skills:

  • Gaining domain knowledge (laboratory analysis of rock samples)
  • Exploratory data analysis (EDA)
  • Visualization
  • Data transformation
  • Industry-specific metric selection
  • Object-oriented programming
  • Regression analysis:
    • Linear regression
    • Regularization
    • K nearest neighbors
    • Random forest
    • Boosting (AdaBoost)
  • Hyperparameter optimization (Optuna)



The dataset included data on 455 mercury injection capillary pressure experiments with about 20 features (data on oil wells and geology) and 200 target variables. A literature review demonstrated that the target variables are actually curves of mercury injection volumes vs. pressure, i.e. 100 datapoints with two coordinates: volume and pressure.

I tried several regression methods: linear, KNN, Random Forest and boosting (AdaBoost) with AdaBoost proving to be the best perfomer.

I also proposed an alternative solution with pressure data considered to be predictors which, in my opinion, is closer to real laboratory studies.

For details, please see the following notebooks: