Reproducibility Page for the Submitted Paper

We mined all unresolved questions, without accepeted answers, that include 18 million questions. Instances of z1d.12xlarge and ml.p3.16xlarge (notebook) were used to extract features and build predictive models, respectively. We also developed our code to build predictive models in Jupyter notebooks. Finally, we prepared a demo tool learned with XGBoost algorithm that predicts whether a questions will receive an accepted answer or not.

Data Collection

We started with Stack Overflow data dump. Next, we imported this data into an MSSQL database and developed our code in T-SQL and python to extract the proposed features.

Features (3.7GB)
A sample of the features (10K records)
Tags

Building Predictive Models

We trained the mentioned models on ml.p3.16xlarge building notebook. Developed code are provided below:

Training models on all features:

Training models on prior features:

Training models on new features only:

Predictive model API: