Vehicle crashes happen daily around the globe. They, for example, cost the New York City economy an enormous amount of $4 billion per year [link]. Thus, we thought it might be beneficial to invistigate and learn more about this phenomena and analyse the core reasons and contributing factors behind those accidents. Our goal through this project is to give the end user the ability to do interactive investigation, learning and building their own assumptions about this phenomena based on our extensive statistical analysis and visulizations.
The project is focused on the Response variable that indicates whether there is a injury/kill or not, simply put whether there was a serious accident or not. Response=1 means we have a serious accident (involving an injury or kill) and Response=0 means the accident wasn't so serious.
The NYC Motor Vehicle Collisions dataset is freely available through NYC Open Data and has well defined Spatio-temporal information on casualties/damages features.
Learn MoreThe NYC Street Speed Limit dataset is free to the public through NYC Open Data and available since 2018.
Learn MoreIn these sections some basic stats are presented in tables.
View MoreA multi-feature analysis is performed and the cross-correlation matrix is extracted.
View MoreIn this section the focus is the Response variable, which is analyzed and visualized based on the rest of the features. A spatio-temporal visualization with the use of a map is also shown.
View MoreIn this section the user can interactively investigate the different vehicle types and contributing factors distribution patterns across 24hrs.
View MoreTake a tour on how we prepare the data for our Machine learning task.
View MoreThree different models are compared, namely: Decision Tree, Random Forest and Logistic Regression. Hyperparameter tuning is also performed to the Decision Tree which has the best accuracy.
View MoreIn this section we present the final tuned model (Decision Tree) and present the feature contribution on the Response variable.
View MoreAll models are incorrect but some of them are useful...