top of page
Home: Welcome

DATA ENTHUSIAST

"Maybe stories are just data with a soul."

I am Sanjit Sisodiya - A Data Enthusiast. I am passionate about deriving insights from data and can make stories speak through visualization and dashboards. Over the past four years of my work experience, I have developed a belief in the process of working with data > extracting information >  taking action. Here, is the blog of some of my work involving machine learning and predictive analytics.

TECHNICAL SKILLS:

  • SQL

  • R

  • Python

  • Tableau

  • SPSS

  • Google Analytics

 

Search

PREDICTIVE MODEL FOR PROJECT COMPLETION

  • Writer: Sanjit_3282
    Sanjit_3282
  • Nov 1, 2018
  • 1 min read

This was an individual project I did in summer at NYC Department of Design and Construction as Data Analyst intern. The idea was to predict construction projects which would breach their schedule estimate. Thus, I designed a predictive model using R, SQL to check whether a construction project will complete within in contract schedule estimate and the model attained 86.5% accuracy.


A comprehensive look was taken at all the things that might affect the schedule of the construction project, so all the data related to Contract, Budget, Schedule and Project Info details was collected from SQL server databases. The data was available from 1994 till 2022 projects which are scheduled for future. The data for 153 variables was cleaned, merged and prepared using R and SQL. Projects from 2013 to 2017 were considered for building our model, owing to their data integrity.


We used ON and OFF terminology for our project. ON projects are the one completing within its time or 10% allowance time of its duration and OFF projects being taking more time than 10% of its duration. The data exploration was done using R, SQL and Tableau to see what variables affect the schedule the most.


I started with Logistic Regression to predict ON/OFF status of the project, but also built predictive models using Random Forest and Decision trees. 11 variables were significant in predicting the project status and Random Forest model provided the highest accuracy (86.5%).


Below is the video of the project presentation for reference.



With my DDC Mentors and Paul Wanjoon Cho, who was busy taking this pic



 
 
 

Comments


Home: Blog2
Home: GetSubscribers_Widget

©2018 by Sanjit Sisodiya. Proudly created with Wix.com

bottom of page