UK Traffic Accident Analysis (2015-2018)
Road safety remains one of the most critical public health challenges in the United Kingdom, affecting thousands of lives annually through accidents, injuries, and fatalities. This comprehensive analysis examines traffic accident data from 2015 to 2018, seeking to understand the complex interplay of factors that contribute to road incidents and their severity. By analyzing over 529,294 accidents, this study provides crucial insights into patterns, risk factors, and potential intervention points for improving road safety.
The significance of this analysis extends beyond mere statistics, touching upon fundamental aspects of public safety, urban planning, and social responsibility. Through detailed examination of temporal patterns, environmental conditions, and demographic factors, we uncover the multifaceted nature of traffic accidents. The findings reveal how various elements - from weather conditions and road types to driver demographics and vehicle characteristics - interact to influence both the likelihood and severity of accidents. This understanding is crucial for developing targeted interventions and evidence-based policies to enhance road safety.
This study employs advanced statistical analysis and visualization techniques to decode complex patterns within the data, focusing particularly on identifying high-risk scenarios and vulnerable populations. By examining the relationship between factors such as time of day, weather conditions, road types, and accident severity, we aim to provide actionable insights for policymakers, urban planners, and road safety professionals. The analysis pays special attention to age-related patterns, environmental impacts, and vehicular factors, offering a comprehensive view of road safety challenges in the UK.
Research Objectives:
- Identify temporal and seasonal patterns in accident occurrence
- Analyze the impact of environmental and road conditions
- Evaluate demographic patterns in casualties
- Assess vehicle-specific risk factors
- Develop evidence-based safety recommendations
ETL Process
A comprehensive ETL process was performed on road safety data spanning multiple years, focusing on three main datasets: Accidents, Casualties, and Vehicles. The process included data validation, standardization, and consolidation, with particular attention to date/time format standardization from UK to US formats.
Detailed ETL Process
1. Data Validation and Structure Assessment
- Performed initial data comparison between redundant 2015 datasets to ensure data integrity
- Validated matching data between Data\RoadSafetyData_2015 and organized folder structure
- Confirmed 100% match for Accidents, Casualties, and Vehicles datasets
- Decision made to remove redundant 2015 folder after validation
2. File Structure Standardization
Implemented consistent naming convention across all datasets:
- Renamed 2017 accident files from "ACC" to "Accidents_2017"
- Standardized Casualties files naming convention across years
- Updated Vehicles files from "Veh.csv" to standardized naming format
- Organized files into appropriate report type directories
3. Data Transformation
Date/Time Format Standardization
- Converted all timestamps from UK to US format
- Resolved secondary time format issues related to hours and minutes
- Implemented additional validation checks for datetime consistency
4. Data Cleaning
Executed custom data cleaning script to handle:
- Data standardization
- Error correction
- Format consistency
- Generated comprehensive data cleaning report using custom Python code
5. Data Consolidation
Combined cleaned datasets into three main categories:
- Accidents master file
- Casualties master file
- Vehicles master file
Technical Notes
Primary challenge areas addressed:
- Date/time format standardization
- File organization and naming conventions
- Data validation and verification
Custom Python scripts were developed for:
- Data comparison and validation
- Cleaning operations
- Report generation
- Data consolidation
Process Timeline
Data Legend
Accident Severity
Code | Label |
---|---|
1 | Fatal |
2 | Serious |
3 | Slight |
Weather Conditions
Code | Label |
---|---|
1 | Fine no high winds |
2 | Raining no high winds |
3 | Snowing no high winds |
4 | Fine + high winds |
5 | Raining + high winds |
6 | Snowing + high winds |
7 | Fog or mist |
8 | Other |
9 | Unknown |
-1 | Data missing or out of range |
Road Type
Code | Label |
---|---|
1 | Roundabout |
2 | One way street |
3 | Dual carriageway |
6 | Single carriageway |
7 | Slip road |
9 | Unknown |
12 | One way street/Slip road |
-1 | Data missing or out of range |
Casualty Type
Code | Label |
---|---|
0 | Pedestrian |
1 | Cyclist |
2 | Motorcycle 50cc and under rider or passenger |
3 | Motorcycle 125cc and under rider or passenger |
4 | Motorcycle over 125cc and up to 500cc rider or passenger |
5 | Motorcycle over 500cc rider or passenger |
8 | Taxi/Private hire car occupant |
9 | Car occupant |
10 | Minibus (8 - 16 passenger seats) occupant |
11 | Bus or coach occupant (17 or more pass seats) |
16 | Horse rider |
17 | Agricultural vehicle occupant |
18 | Tram occupant |
19 | Van / Goods vehicle (3.5 tonnes mgw or under) occupant |
20 | Goods vehicle (over 3.5t. and under 7.5t.) occupant |
21 | Goods vehicle (7.5 tonnes mgw and over) occupant |
22 | Mobility scooter rider |
23 | Electric motorcycle rider or passenger |
90 | Other vehicle occupant |
97 | Motorcycle - unknown cc rider or passenger |
98 | Goods vehicle (unknown weight) occupant |
Vehicle Type
Code | Label |
---|---|
1 | Pedal cycle |
2 | Motorcycle 50cc and under |
3 | Motorcycle 125cc and under |
4 | Motorcycle over 125cc and up to 500cc |
5 | Motorcycle over 500cc |
8 | Taxi/Private hire car |
9 | Car |
10 | Minibus (8 - 16 passenger seats) |
11 | Bus or coach (17 or more pass seats) |
16 | Ridden horse |
17 | Agricultural vehicle |
18 | Tram |
19 | Van / Goods 3.5 tonnes mgw or under |
20 | Goods over 3.5t. and under 7.5t |
21 | Goods 7.5 tonnes mgw and over |
22 | Mobility scooter |
23 | Electric motorcycle |
90 | Other vehicle |
97 | Motorcycle - unknown cc |
98 | Goods vehicle - unknown weight |
-1 | Data missing or out of range |
Data Dictionary
Accidents Dataset
Field Name | Description | Data Type | Values/Format |
---|---|---|---|
Accident_Index | Unique identifier for each accident | String | Unique value linking to vehicle and casualty data |
Location_Easting_OSGR | Easting location in OSGR format | Numeric | Grid Reference (-1 for missing data) |
Location_Northing_OSGR | Northing location in OSGR format | Numeric | Grid Reference (-1 for missing data) |
Longitude | Longitude in WGS84 format | Decimal | WGS 1984 coordinate system |
Latitude | Latitude in WGS84 format | Decimal | WGS 1984 coordinate system |
Accident_Severity | Severity of the accident | Integer | 1: Fatal, 2: Serious, 3: Slight |
Number_of_Vehicles | Number of vehicles involved | Integer | Count of vehicles |
Number_of_Casualties | Number of casualties | Integer | Count of casualties |
Date | Date of accident | Date | DD/MM/YYYY format |
Time | Time of accident | Time | HH:MM 24-hour format |
Casualties Dataset
Field Name | Description | Data Type | Values/Format |
---|---|---|---|
Accident_Index | Reference to accident record | String | Links to accident data |
Vehicle_Reference | Reference to vehicle involved | Integer | Links to vehicle data |
Casualty_Class | Type of casualty | Integer | 1: Driver/Rider, 2: Passenger, 3: Pedestrian |
Sex_of_Casualty | Gender of casualty | Integer | 1: Male, 2: Female, -1: Unknown |
Age_of_Casualty | Age of casualty | Integer | Age in years (-1 for unknown) |
Vehicles Dataset
Field Name | Description | Data Type | Values/Format |
---|---|---|---|
Accident_Index | Reference to accident record | String | Links to accident data |
Vehicle_Type | Type of vehicle | Integer | Various codes for vehicle types |
Age_of_Driver | Age of driver | Integer | Age in years (-1 for unknown) |
Age_of_Vehicle | Age of vehicle | Integer | Age in years (-1 for unknown) |