Reproducibility Page for the Submitted Paper

Code and Dataset to Reproduce the Results

We mined all unresolved questions, without accepeted answers, that include 18 million questions. Instances of z1d.12xlarge and ml.p3.16xlarge (notebook) were used to extract features and build predictive models, respectively. We also developed our code to build predictive models in Jupyter notebooks. Finally, we prepared a demo tool learned with XGBoost algorithm that predicts whether a questions will receive an accepted answer or not.

Data Collection

We started with Stack Overflow data dump. Next, we imported this data into an MSSQL database and developed our code in T-SQL and python to extract the proposed features.

Building Predictive Models

We trained the mentioned models on ml.p3.16xlarge building notebook. Developed code are provided below: