Trip Data White Paper

In this post

Overview of Trip Data

Origin Destination (O-D) data (also called ‘trips’ data) is a measure of the number of individuals making a trip across the transportation network between given sets of origins and destinations. Trips data has historically been used as a representation of how populations move through a geographic area, commonly aggregated into geographic zones of similar land uses, for all trip purposes, including work, school, shopping, recreation, and more. Since the most common use of trips data is for planning and travel demand modeling, zones are typically aligned with census or other planning area boundaries to cleanly integrate with socio-economic data, enabling trips to be estimated from and to each zone via factors such as population, income, number households, household size etc.

Traditionally, the integration of socio-economic and O-D data required conducting lengthy and costly household surveys to develop travel demand forecasting model assumptions. However, advances in technology have significantly reduced these costs through the use of mobile location-based services (LBS) data to estimate and calibrate O-D data more efficiently than traditional surveys.

Methodology

Urban SDK leverages LBS data to estimate trips at aggregated geographic levels, such as the census tract or geohash level. Raw LBS data consists of large datasets of ‘observations’ of unique, anonymized mobile device IDs (abbreviated as ‘MAIDs’), each containing the timestamp and geographic coordinates (in latitude and longitude) at which the MAID was observed. As the owners of mobile devices make trips across a geographic area throughout the course of their day, mobile observations of their MAIDs are repeatedly generated.

Urban SDK obtains datasets of MAID observations over a large geographic area and applies an algorithm to this data to trace trips made at a population level across a typical day, thus producing O-D data. The process begins with selecting a sufficiently large boundary for mobile data observations, as trips can be made into a limited study area from far outside of the area’s boundaries in the course of a typical day. Duplicate MAID observations, as well as those not representing trips on the transportation network (such as observations generated as a mobile device owner walks around a shopping mall), are discarded.

The algorithm then examines series of time-ordered MAID observations occurring at sufficiently large distances and timestamps from one another to be reasonably likely to represent trips on the transportation network, picking out their true origin and destination coordinates and classifying the journey from origin to destination as a trip. Using these coordinates, as well as zone boundaries, trips data is aggregated geospatially and the number of trips to and from each zone is calculated hourly, producing hourly O-D flows data which can then be further aggregated to daily values. Zones to which O-D flows data are aggregated geospatially can vary as desired, including to different zoning systems inside and outside a set boundary (for instance, a custom Traffic Analysis Zone (TAZ) system within an urban boundary, and Census Tracts beyond the urban boundary).

Validation

Unlike other forms of mobility data, such as traffic volumes at intersections or mid-block along roadways, there is generally little to no real-world ‘ground truth’ data collected about the origin-destination travel trends of trips made across different zones. Typically, collection of ‘ground truth’ for origin-destination data to allow quantitative assessment of the validity of estimated trips data would require a large-scale household travel survey across a wide geographic region, achieving a sufficient response rate for statistical purposes. Such an effort is very labour-intensive and costly, and as a result real-world ‘ground truth’ for trips data is rarely available.

Consequently, assessing the ‘goodness of fit’ of estimated trips derived from LBS data is typically done by validation against estimated O-D flows from an MPO-developed travel demand model. However, as travel demand models also involve assumptions and their outputs involve inherent modelling uncertainty, this process is essentially a comparison of one estimate against another. Deviations between the data derived from LBS sources and from travel demand models can be attributed to the uncertainty inherent in either method.

Figure 1 presents the log-transformed distribution of trips from an MPO travel demand model (left) and Urban SDK method using LBS data (right). The figure provides an overall insight for comparison between number of trips generated regardless of their start and end points.

urban sdk origin destination trip data — **Figure 1:** Trip distribution (log form) from an MPO travel demand model and Urban SDK method

In a more granular comparison, Figure 2 presents data specific to the start and ends points of trips. Each point on the graph represents the number of trips assigned to a specific origin-destination pair, i.e. from one specific census tract to another. The y-value of each point is the quantity of trips assigned to this origin-destination pair by the MPO’s travel demand model, while the x-value of each point is the quantity of trips assigned by Urban SDK’s LBS data.

The green line running diagonally across the plot represents an ideal fit; points falling on this line represent O-D pairs in which MPO and Urban SDK data matches perfectly. If the MPO and Urban SDK models generated identical outputs, therefore, all data points on this plot would fall on the green line. As this is not the case, a linear regression was performed to determine the ‘goodness-of-fit’ between the two data sources. The red line represents the ‘line-of-best-fit’ passing through the point data, as determined by the linear regression. Its R-squared goodness-of-fit parameter achieved a value of 0.8444 which, although below the theoretical “perfect fit” represented by an R-squared of 1.00, indicates a high degree of correlation between MPO and Urban SDK trips data.

Visualization

The following figures show how O-D data can be visualized and leveraged to identify significant origin-destination pair trends and valuable metrics.

Trip Data White Paper

In this post

Overview of Trip Data

Methodology

Validation

Visualization

Company

Solutions

Data

Resources

Support

Platform