::include_graphics("https://i.imgur.com/7txj1aA.png") knitr
I came across this graphic from an APTA Factbook(p. 20) which shows transit with an enormous safety advantage over automobiles. I was curious if I could update the data using information from the Bureau of Transportation Statistics.
Passenger Miles
First, we want to get passenger miles traveled by different modes. The BTS has this data in an Excel spreadsheet at https://www.bts.gov/content/us-passenger-miles. The data in the spreadsheet is somewhat spotty, and there are plenty of footnotes describing the fact that the sources of the different elements have changed over the years. But let’s go with just data from 2000 onward.
I wanted to read the Excel files directly from the BTS website, but there was just too much cleaning that needed to be done to make them machine-readable in a nice way. You can find the lightly modified spreadsheet I used, including a separate analysis from this one, here
# get Excel file from BTS
<- read_excel(here("posts/safety/passmiles.xlsx"),
passmiles skip = 1, sheet = "passenger_miles") %>%
filter(!is.na(Mode)) %>%
select(Group, Mode, `2000`:`2018`) %>%
pivot_longer(cols = `2000`:`2018`, names_to = "year", values_to = "passmiles",
names_transform = list(year = as.integer)) %>%
# convert from million miles to miles
mutate( passmiles = passmiles * 1e6 ) %>%
distinct()
The numbers in the data table are given in Millions. To keep our units straight, we are going to use straight miles for now.
ggplot(passmiles, aes(x = year, y = passmiles, label = Mode, color = Mode,
group = str_c(Group, Mode))) +
geom_path() + facet_wrap(~Group, scales = "free_y") +
geom_dl(method = list(dl.trans(x = x + 0.2), "last.points", cex = 0.6)) +
geom_dl(method = list(dl.trans(x = x - 0.2), "first.points", cex = 0.6)) +
scale_x_continuous(limits = c(1995, 2022)) +
scale_color_discrete(guide = "none") +
ylab("Passenger Miles") + xlab("Year") +
theme_bw()
Fatalities by Mode
The BTS also publishes statistics on fatality broken up by mode at: https://www.bts.gov/content/transportation-fatalities-modea. However, there are a number of potential issues arising from these data. First, transit fatalities are grouped by workers and vehicle occupants, but are not separated by mode. Additionally, there are a number of redundant categories.
<- read_excel(here("posts/safety/passmiles.xlsx"),
fatalities sheet = "fatalities") %>%
filter(!is.na(Mode)) %>%
select(Group, Mode, `2000`:`2018`) %>%
pivot_longer(cols = `2000`:`2018`, names_to = "year", values_to = "fatalities",
names_transform = list(year = as.integer))
ggplot(fatalities %>% filter(Group %in% c("Air", "Highway", "Rail", "Transit")),
aes(x = year, y = fatalities, label = Mode, color = Mode,
group = str_c(Group, Mode))) +
geom_path() + facet_wrap(~Group, scales = "free_y") +
geom_dl(method = list(dl.trans(x = x + 0.2), "last.points", cex = 0.6)) +
geom_dl(method = list(dl.trans(x = x - 0.2), "first.points", cex = 0.6)) +
scale_x_continuous(limits = c(1995, 2022)) +
scale_color_discrete(guide = "none") +
ylab("Fatalities") + xlab("Year") +
theme_bw()
Warning: Removed 6 rows containing missing values (`geom_path()`).
Warning: Removed 6 rows containing missing values (`geom_dl()`).
Removed 6 rows containing missing values (`geom_dl()`).
Fatalities per 100 Million Miles
Because the two report tables use different modal definitions, we will consider only group-levels with the following pairs:
- Air: all fatalities and passenger miles
- Highway: all fatalities and passenger miles, minus bus fatalities and miles
- Transit: fatalities and passenger miles by rail and non-rail
- Rail: Only Intercity rail miles; passenger rail fatalities minus transit rail fatalities.
<- passmiles %>%
miles_summary # remove bus miles from highway group
filter(!(Group == "Highway" & Mode == "Bus")) %>%
# only keep intercity rail
filter(!(Group == "Rail" %in% c("Light Rail", "Heavy Rail", "Commuter Rail", "All"))) %>%
# remove summary miles from all other groups
filter(!(Group == "Transit" & Mode == "All")) %>%
filter(!(Group == "Highway" & Mode == "All")) %>%
# simplify transit modes into rail and non-rail
mutate(
Mode = case_when(
== "Transit" & Mode %in% c("Light Rail", "Heavy Rail", "Commuter Rail") ~ "Rail",
Group == "Transit" ~ "Non-Rail",
Group == "Highway" ~ "All",
Group == "Rail" ~ "All",
Group == "Air" ~ "All",
Group TRUE ~ Mode
)%>%
) group_by(Group, Mode, year) %>%
summarise(passmiles = sum(passmiles))
`summarise()` has grouped output by 'Group', 'Mode'. You can override using the
`.groups` argument.
<- fatalities %>%
fatalities_summary mutate(
keep = case_when(
# all air
== "Air" & Mode == "All" ~ T,
Group # all highway but bus
== "Highway" & !(Mode %in% c("All", "Bus")) ~ T,
Group # transit rail and non-rail
== "Transit" & Mode %in% c("Non-Rail", "Rail") ~ T,
Group # passenger rail
== "Rail" & Mode == "All" ~ T,
Group TRUE ~ F
),Mode = ifelse(Group == "Highway", "All", Mode),
%>% filter(keep) %>%
) group_by(Group, Mode, year) %>%
summarise(fatalities = sum(fatalities))
`summarise()` has grouped output by 'Group', 'Mode'. You can override using the
`.groups` argument.
# remove transit rail fatalities from rail
<- fatalities_summary %>% filter(Mode == "Rail") %>% pull(fatalities)
transit_rail_fatalities
<- fatalities_summary %>%
fatalities_summary mutate(
transit_rail_fatalities = transit_rail_fatalities,
fatalities = ifelse(Group == "Rail", fatalities - transit_rail_fatalities, fatalities)
)
Finally, we can join the two datasets together, and compute the fatalities per 100 million miles.
<- left_join(miles_summary, fatalities_summary,
fatal_rate by = c("Group", "Mode", "year")) %>%
mutate(
fatal_rate = fatalities / (passmiles / 1e8) # fatalities per 100 million miles
)
ggplot(fatal_rate, aes(
x = year, y = fatal_rate,
color = str_c(Group, Mode, sep = " "),
label = str_c(Group, Mode, sep = " "),
+
)) geom_path() +
geom_dl(method = list(dl.trans(x = x + 0.2), "last.points", cex = 0.6)) +
geom_dl(method = list(dl.trans(x = x - 0.2), "first.points", cex = 0.6)) +
scale_x_continuous(limits = c(1999, 2019)) +
scale_color_discrete(guide = "none") +
ylab("Fatalities per 100 Million Passenger Miles") +
xlab("Year") +
theme_bw()
From my understanding, if a light rail train hits a pedestrian, it counts as a highway fatality and a transit fatality. Given that a large number of the rail fatalities are due to trespassers and at-grade crossings, this may overstate the safety to rail passengers. Also, it double-counts fatalities resulting from a commuter rail train colliding with a private automobile.
My thoughts on the APTA chart:
- I believe they used Vehicle Miles Traveled (VMT) instead of passenger miles traveled. Because the average occupancy of a vehicle is somewhat over 1.0, this overstates the relative danger of highway modes.
- I believe they considered only fatalities among transit passengers resulting from train crashes, and did not include fatalities among transit workers or other users killed by trains. I believe they did, however, include all highway fatalities, and not simply vehicle occupants. You could make the argument that this is the correct comparison, but I don’t believe that it is fair.