Pump It Up: Data Mining the Water Table
Table of Contents
Introduction
This is a comprehensive data science project, encompassing E.D.A, machine learning models and data visualization. This project is based on the DrivenData competition.
Method
Myself and a colleage worked together on this project. We used a variety of preprocessing and ML techniques to determine the best method for solving this problem. The objective was to predict the condition of water pumps in Tanzania using a range of different features. Our best performing model was a VotingClassifier, which ensembled a BaggingClassifier, a XGBoost, a HistGradientBoost and a CatBoost model. This model produced an 81% accuracy on the test set.
To read more into our process, please view the Submission Notebook in the GitHub repository.
Key Technologies
- Python
- Pandas
- Weights and Biases