Vaccine Reluctance Let’s look at Texas counties and test various factors for correlations to the vaccination rate. We’ll primarily look at the rate of the first vaccination, since there are a variety of reasons why someone might not get the second dose. Let’s start with the raw rates of vaccination by county. Vaccine %>% mutate(Pct_one_dose=People_one_dose/Pop_total) %>% ggplot(aes(x=Date, y=Pct_one_dose, color=County)) + geom_line(show.legend = FALSE) + labs(x="Date", y="First Dose Percentage", title="Texas counties Vaccine Progress") Hmmm… let’s do a little cleanup.
December 2020 vaccine distribution and administration began. I started trapping the daily spreadsheet from the state health department that tracked progress. This blog entry is really for prototyping some of the data cleanup and displays that I will incorporate into my shiny app. Let’s take a look at data issues. df %>% filter(!is.na(Pct_given)) %>% ggplot(aes(x=Pct_given)) + geom_histogram()+ labs(x="Percent Distributed Administered", title="Distribution of Administered Vaccine") ## `stat_bin()` using `bins = 30`.
Let’s take a look at the early voting data for Harris County Since I already have a bunch of data for Harris county precincts and zipcodes, why not make some use of it? Setup path <- "/home/ajackson/Dropbox/Rprojects/Voting/" BBM <- read_csv(paste0(path, "Cumulative_BBM_1120.csv"), col_types = "ccccccccccccccccccccccccccccccccccccccccc") BBM <- BBM %>% mutate(ActivityDate=mdy_hms(ActivityDate)) %>% mutate(ActivityDate=force_tz(ActivityDate, tzone = "US/Central")) %>% select(ElectionCode:ActivityDate) %>% mutate(Ballot_Type="Mail") EV <- list.files(path=path, pattern="Cumulative_EV_1120_1*", full.names=TRUE) %>% map_df(~read_csv(., col_types = "ccccccccccccccccccccccccccccccccccccccccc")) EV <- EV %>% mutate(ActivityDate=mdy_hms(ActivityDate)) %>% mutate(ActivityDate=force_tz(ActivityDate, tzone = "US/Central")) %>% select(ElectionCode:ActivityDate) %>% mutate(Ballot_Type="Early") Votes <- rbind(BBM, EV) VotesByZipDate <- Votes %>% mutate(Date=floor_date(ActivityDate, unit="day")) %>% group_by(Date, Ballot_Type, VoterZIP) %>% summarise(Votes=n()) %>% ungroup() %>% rename(Zip=VoterZIP) %>% drop_na() ########### registered voters path <- paste0(path, "HarrisRegisteredVoters/") files <- dir(path=path, pattern = "*.
Harris County COVID-19 data I have a very nice (I hope) dataset consisting of number of positive COVID-19 cases per day in Harris county by zipcode. In this blog entry I would like to study this dataset and look at comparisons with various other data. Initial look First off, let’s explore the data for issues, and for ideas about what might be interesting. # How is the data distributed? Let's look at the most recent day Harris %>% group_by(Zip) %>% summarize(Cases_today=last(Cases)) %>% ggplot(aes(x=Cases_today)) + geom_histogram() So over 20 zipcodes have no cases (but they may also have no people), and it looks like most zipcodes are in the 250-750 range.
Covid tests in Texas The second entry to look at the testing in Texas. This time there is much more data to examine, but also a new and irritating problem. At some point, for some counties, antibody testing got mixed in with the PCR tests, so the numbers are not nearly as good as they should be. Last I read, about 10% of the tests are the wrong test, but I suspect this is not evenly distributed by county, but rather concentrated in a few.
Introduction Towards the end of May the state of Texas suddenly began adding the number of COVID-19 cases detected amongst prison inmates to the county totals for the counties in which the prisons resided. However, they have not indicated if they did this change on a single day, or if it may have taken place over several days, for different prisons. In this bit of work, I will try to ferret out what they did as best I can, so that I may best correct my own data.
Piecewise data fitting As the COVID-19 pandemic progresses, the simple exponential and logistic models no longer fit the data very well. As waves of infection and retrenchment occur, it seems likely that the best fits will be done piecewise. For this blog entry I will experiment with various schemes to see if I can get a reasonably good strategy for constrained fitting to the data. As I have a well-structured dataset for all the counties in Texas, that is what I will use for the experiments.