PREDICTIVE MODEL FOR PROJECT COMPLETION
- Sanjit_3282
- Nov 1, 2018
- 1 min read
This was an individual project I did in summer at NYC Department of Design and Construction as Data Analyst intern. The idea was to predict construction projects which would breach their schedule estimate. Thus, I designed a predictive model using R, SQL to check whether a construction project will complete within in contract schedule estimate and the model attained 86.5% accuracy.

A comprehensive look was taken at all the things that might affect the schedule of the construction project, so all the data related to Contract, Budget, Schedule and Project Info details was collected from SQL server databases. The data was available from 1994 till 2022 projects which are scheduled for future. The data for 153 variables was cleaned, merged and prepared using R and SQL. Projects from 2013 to 2017 were considered for building our model, owing to their data integrity.
We used ON and OFF terminology for our project. ON projects are the one completing within its time or 10% allowance time of its duration and OFF projects being taking more time than 10% of its duration. The data exploration was done using R, SQL and Tableau to see what variables affect the schedule the most.
I started with Logistic Regression to predict ON/OFF status of the project, but also built predictive models using Random Forest and Decision trees. 11 variables were significant in predicting the project status and Random Forest model provided the highest accuracy (86.5%).
Below is the video of the project presentation for reference.

Comments