CARS - Connecting Administrative vehicle data for Research on Sustainable Transport

Passenger cars and vans have a major influence on our environment, society, and health. In order to generate insights about these challenges, researchers need access to reliable information about vehicle details, usage and trends. This would enable them to build evidence to inform transport policy that benefits the public.

Details about vehicle attributes and location are currently collected by by the Driver and Vehicle Standards Agency (DVSA) during a vehicle’s annual MOT test, which checks that a vehicle meets road safety and environmental standards. Vehicle mileages are also recorded during these tests albeit calculating annual vehicle mileages is computationally complex and requires significant investment of time and resources. Additional vehicle attributes are collected by the Driver and Vehicle Licensing Agency (DVLA) when the vehicle is first registered and when it changes hands. 

This project aims to link these data sources to create de-identified datasets that researchers can use to understand vehicle ownership and usage patterns. They will contain information on all light-duty vehicles (under 3.5 tonnes) in Great Britain including data on vehicle type, mileage, and location. We will generate data at Postcode area resolution (e.g. LS, BS, M) which will give a sub-regional spatial dimension to data This new resource will have the potential to generate insights that inform sustainable transport policy design and implementation, at national and various sub-national levels. 

This project is a partnership between the University of Leeds, the University of Bristol and the RAC Foundation. Further input will come from the  the DVLA,  DVSA and the Office for National Statistics (ONS). The value of the data will be explored through work with the Department for Transport (DfT) and various regional and local authorities. 

The data

This project aims to restructure and enhance data available through the Open Government Licence. The datasets used will be:

(1)    MOT test data from the DVSA bulk MOT history API https://documentation.history.mot.api.gov.uk/mot-history-api/download-vehicle-mot-history-data/files/

Example vehicle attributes include:

  • make and model
  • date of first use
  • engine size
  • Mileage
  • Test class
  • VRM

(2)    The Anonymised MOT tests and results dataset https://www.data.gov.uk/dataset/e3939ef8-30c7-4ca8-9c7c-ad9475cc9b2f/anonymised_mot_test

Example vehicle attributes include:

  • make and model
  • date of first use
  • engine size
  • Mileage
  • Test result categories
  • The postcode area of test location

(3)    DVLA VES API

https://developer-portal.driver-vehicle-licensing.api.gov.uk/apis/vehicle-enquiry-service/vehicle-enquiry-service-description.html#vehicle-enquiry-service-ves-api-guide

Example vehicle attributes include:

  • make and model
  • date of first use
  • date of last V5 issued
  • Tax status
  • engine size

At present, these datasets are too large to be opened on non-specialist computer systems and are in a format not readily accessible to most researchers. The existing data is also poorly documented and omits important information on limitations and how the data should be used in a research context.

The primary goal of the project is to create a clean, updatable and widely usable ‘vehicle centric’ dataset spanning the available years by:

  • creating a table of unique vehicles with a consensus of their vehicle properties to be recorded once;
  • linking a table of MOT history to each vehicle including the location recorded for each test.

We would then provide a means to allow researchers to access derived datasets in a Research Ready format. The secondary goal of the research is to share a computational workflow which would enable these sets of data to be routinely combined including making standardised and automated mileage calculations. This project will also consider how the project datasets could be enhanced in the future. For example, they will explore linking them to de-identified area-based statistics (including Census 2021 data), accident statistics or to other vehicle-level data, such as automatic number plate recognition data. 

This will result in de-identified datasets being made available for further use by accredited researchers, including:

  • vehicle-level data
  • aggregated vehicle data, for specific geographical areas.

Potential of the newly linked data

By linking the datasets, this project aims to provide an ongoing resource to inform urgent local and national transport, environmental, and social objectives. Research using the linked datasets will have the potential to generate insights relevant to the following policy areas:

  • climate change, for example, tracking the uptake of electric vehicles over space and time, and comparing their mileage profiles to fossil-fuelled equivalents
  • air quality and health, for example, using information on local vehicles to design the most efficient and fair geographical boundaries for location-based vehicle charging regimes
  • road safety, for example, analysing the relationship between different segments of the vehicle market and road collisions
  • taxation, for example, supporting the design of fair motoring taxation and forecasting revenue
  • transport evaluation, for example, assessing the effect of local transport initiatives on residents’ car mileages.

We will provide exemplar “vignette” analyses.  Candidate analyses include mileage rate validation through comparison with National Road Traffic Forecasts, examining how vehicle ownership levels, car mileage rates and vehicle ages vary across the country and over time, and examination of vehicle longevity and their usage profiles by age to inform length of time it may take to fully decarbonise the fleet.   

Project funders and other specialists

This project is funded via the ADR UK research-ready data and access fund, a dedicated fund for commissioning research using newly linked administrative data. Funding decisions were based on advice from an independent expert panel, and in consultation with the Office for National Statistics. This project is part of the ADR England portfolio.
Details of the funding grant awarded by ADR UK for this project can also be found on the UK Research and Innovation (UKRI) Gateway to Research platform.
 

Data Science Specialist: Will Chapman 
Project Officer: Dr. Theresa Nelson 
 

Impact

The goal is that these two sets of data generated by the motoring public will be routinely combined, mileage calculations standardised and automated and an anonymised single set of data is made available. We expect to produce multiple research-ready datasets from this project. Our core 'minimum viable product' (MVP) will consist of:

1. 'Vehicle-level Data' which is the most disclosive and stored inside a Trusted Research Environment (TRE) (The ONS Secure Research Environment)

2. 'Aggregated Vehicle Data' which is aggregated to an appropriate geography, such as LSOA and also stored within the TRE for wider release

The fusing of the two datasets has the potential to provide an ongoing resource to inform urgent local and national transport, environmental and social objectives including:

Climate change:

• Understanding the spatial variations in car and van ownership and use and the relationship with local socio-demographic, infrastructure and policy characteristics

• Analysing and tracking the uptake of electric vehicles over space and time to inform the design of national or local policies to accelerate their uptake

• Comparing the mileage profiles of electric vehicles over their lifetime to their fossil-fuelled counterparts to inform traffic predictions and carbon pathways

• Analysing and predicting the location and usage patterns of electric vehicles in order to better target investment in charging infrastructure and reinforcements to the electricity grid

• Tracking changes in the distances travelled by different parts of the vehicle fleet in different places in response to local or national policies or fuel price rises.

Air Quality and health outcomes:

• Using information on the local vehicle fleet to design the most efficient and fair geographical boundaries of location-based vehicle charging regimes (e.g. in low emission or clean air zones)

• Identifying the most polluting components of the local vehicle fleet and targeting policies accordingly • Monitoring the progress of policy interventions over time. The dataset will provide valuable benchmarking data against which to evaluate progress and share best practice Road safety

• Analysis of the relationship between the composition and usage profiles of different segments of the vehicle market and road collisions.

Efficient and fair taxation:

• The design of effective and efficient vehicle and motoring taxation and the forecasting of revenue and social and distributional impacts of this.

Publications and outputs

This project follows on from EPSRC grant: “Motoring and vehicle Ownership Trends in the UK” (EP/K000438/1; PI: Anable; www.motproject.net))