We mined all unresolved questions, without accepeted answers,
that include 18 million questions. Instances of z1d.12xlarge
and ml.p3.16xlarge
(notebook) were used to extract features and build predictive models, respectively. We also developed our code to build predictive models in Jupyter notebooks
. Finally, we prepared a demo tool learned with XGBoost
algorithm that predicts whether a questions will receive an accepted answer or not.
Data Collection
We started with Stack Overflow data dump. Next, we imported this data into an MSSQL
database and developed our code in T-SQL
and python
to extract the proposed features.
Building Predictive Models
We trained the mentioned models on ml.p3.16xlarge
building notebook. Developed code are provided below:
- Training models on all features:
- Training models on prior features:
- Training models on new features only:
- Predictive model API: