December 2020 vaccine distribution and administration began. I started trapping the daily spreadsheet from the state health department that tracked progress. This blog entry is really for prototyping some of the data cleanup and displays that I will incorporate into my shiny app.

Let’s take a look at data issues.

df %>% 
  filter(!is.na(Pct_given)) %>% 
  ggplot(aes(x=Pct_given)) +
  geom_histogram()+
  labs(x="Percent Distributed Administered",
       title="Distribution of Administered Vaccine")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Well, that’s irritating. Some folks are giving shots they don’t have.

#   Negative doses
df %>% 
  group_by(County) %>% 
    mutate(min_admin=min(Daily_adm)) %>% 
  ungroup() %>% 
  arrange(County) %>% 
  filter(min_admin<0) %>% 
  ggplot(aes(x=Date, y=Doses_Admin, color=County)) +
  geom_line()+
  labs(x="Date", y="Cumulative Doses",
       title="Administered Vaccine")

So we have a few counties where the cumulative numbers have decreased. That is disappointing.

# how fair is the distribution?

df %>% 
  group_by(County) %>% 
    mutate(fairness=10000*Doses_Distrib/Pop_Adult) %>% 
    summarise(fairness=max(fairness)) %>% 
  ggplot(aes(x=fairness)) +
  geom_histogram()+
  labs(x="Per Capita Doses Distributed",
       title="Distribution of Administered Vaccine")
## `summarise()` ungrouping output (override with `.groups` argument)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1 rows containing non-finite values (stat_bin).

df %>% 
  filter(County!="Texas") %>% 
  group_by(County) %>% 
    mutate(fairness=10000*Doses_Distrib/Pop_Adult) %>% 
    summarise(fairness=max(fairness), Pop_Adult=max(Pop_Adult)) %>% 
  ggplot(aes(x=fairness, y=Pop_Adult)) + 
  geom_point() + 
  labs(x="Per Capita Doses Distributed (Doses per 10,000)",
       y="County Population",
       title="Distribution of Administered Vaccine")
## `summarise()` ungrouping output (override with `.groups` argument)
## Warning: Removed 1 rows containing missing values (geom_point).

df %>% 
  filter(County!="Texas") %>% 
  group_by(County) %>% 
    summarise(Doses_Distrib=max(Doses_Distrib), Pop_Adult=max(Pop_Adult)) %>% 
  ggplot(aes(x=Doses_Distrib, y=Pop_Adult)) + 
  geom_point() + 
  scale_x_log10() + 
  scale_y_log10() +
  labs(x="Log Doses Distributed",
       y="Log County Population",
       title="Distribution of Administered Vaccine")
## `summarise()` ungrouping output (override with `.groups` argument)
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 1 rows containing missing values (geom_point).

Looks pretty fair. At low numbers can see the effect of minimum vaccine lot size.

#   % over time

df %>% 
  group_by(County) %>% 
    mutate(Pct_Coverage=100*Doses_Admin/Pop_Adult) %>% 
  ungroup() %>% 
  filter(Pct_Coverage<20) %>% 
  ggplot(aes(x=Date, y=Pct_Coverage, color=County)) + 
  geom_line() + 
  theme(legend.position = "none") + 
  labs(x="Date",
       y="Percent Coverage",
       title="Percent of each County Vaccinated")  

Cottle county jumps to almost 60%, the rest are below 6%.

#   How to best calculate the date when everyone is vaccinated?

#   Calculate an average inoculation rate over the last 7 days

this_day <- lubridate::today()

df %>% 
  filter(Date>last(Date)-7) %>% 
  group_by(County) %>% 
    summarize(Inoculation_rate=mean(Daily_adm, na.rm=TRUE),
              Pop_Adult=last(Pop_Adult),
              People_one_dose=last(People_one_dose)) %>% 
  mutate(Finish_date=as_date(this_day+((Pop_Adult-People_one_dose)/Inoculation_rate+28))) %>% 
  mutate(DaysFromNow=((Pop_Adult-People_one_dose)/Inoculation_rate+28)) %>% 
  ggplot(aes(x=DaysFromNow/7)) +
  geom_histogram() +
  labs(x="Weeks from Today",
       title="Weeks from present to reach full vaccination by county",
       subtitle="Rate derived from last 7 days")
## `summarise()` ungrouping output (override with `.groups` argument)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 3 rows containing non-finite values (stat_bin).