Harris County Appraisal District data Let’s start exploring the data. We’ll look at all these exempt properties.
# This takes us from 1.4 million to 74,000 records Dx <- df %>% filter(str_detect(state_class, "^X")) Dx %>% ggplot(aes(x=state_class)) + geom_histogram(stat="count")+ labs(x="Exempt code", y="Number of Properties", title="Number of properties in each exempt class") # Same plot but for total square miles Dx %>% group_by(state_class) %>% summarize(area=sum(land_ar, na.rm=TRUE)*3.58701e-8) %>% ggplot(aes(x=state_class)) + geom_col(aes(y=area))+ labs(x="Exempt code", y="Square Miles", title="Area of properties in each exempt class") # Same plot but for total Market Value Dx %>% group_by(state_class) %>% summarize(area=sum(tot_mkt_val, na.
      
      
    
  
      
        Vaccine Reluctance Let’s look at Texas counties and test various factors for correlations to the vaccination rate.
We’ll primarily look at the rate of the first vaccination, since there are a variety of reasons why someone might not get the second dose.
Let’s start with the raw rates of vaccination by county.
Vaccine %>% mutate(Pct_one_dose=People_one_dose/Pop_total) %>% ggplot(aes(x=Date, y=Pct_one_dose, color=County)) + geom_line(show.legend = FALSE) + labs(x="Date", y="First Dose Percentage", title="Texas counties Vaccine Progress") Hmmm… let’s do a little cleanup.
      
      
    
  
      
        December 2020 vaccine distribution and administration began. I started trapping the daily spreadsheet from the state health department that tracked progress. This blog entry is really for prototyping some of the data cleanup and displays that I will incorporate into my shiny app.
Let’s take a look at data issues.
df %>% filter(!is.na(Pct_given)) %>% ggplot(aes(x=Pct_given)) + geom_histogram()+ labs(x="Percent Distributed Administered", title="Distribution of Administered Vaccine") ## `stat_bin()` using `bins = 30`.
      
      
    
  
      
        Let’s take a look at the early voting data for Harris County Since I already have a bunch of data for Harris county precincts and zipcodes, why not make some use of it?
Setup path <- "/home/ajackson/Dropbox/Rprojects/Voting/" BBM <- read_csv(paste0(path, "Cumulative_BBM_1120.csv"), col_types = "ccccccccccccccccccccccccccccccccccccccccc") BBM <- BBM %>% mutate(ActivityDate=mdy_hms(ActivityDate)) %>% mutate(ActivityDate=force_tz(ActivityDate, tzone = "US/Central")) %>% select(ElectionCode:ActivityDate) %>% mutate(Ballot_Type="Mail") EV <- list.files(path=path, pattern="Cumulative_EV_1120_1*", full.names=TRUE) %>% map_df(~read_csv(., col_types = "ccccccccccccccccccccccccccccccccccccccccc")) EV <- EV %>% mutate(ActivityDate=mdy_hms(ActivityDate)) %>% mutate(ActivityDate=force_tz(ActivityDate, tzone = "US/Central")) %>% select(ElectionCode:ActivityDate) %>% mutate(Ballot_Type="Early") Votes <- rbind(BBM, EV) VotesByZipDate <- Votes %>% mutate(Date=floor_date(ActivityDate, unit="day")) %>% group_by(Date, Ballot_Type, VoterZIP) %>% summarise(Votes=n()) %>% ungroup() %>% rename(Zip=VoterZIP) %>% drop_na() ########### registered voters path <- paste0(path, "HarrisRegisteredVoters/") files <- dir(path=path, pattern = "*.
      
      
    
  
      
        Harris County COVID-19 data I have a very nice (I hope) dataset consisting of number of positive COVID-19 cases per day in Harris county by zipcode. In this blog entry I would like to study this dataset and look at comparisons with various other data.
Initial look First off, let’s explore the data for issues, and for ideas about what might be interesting.
# How is the data distributed? Let's look at the most recent day Harris %>% group_by(Zip) %>% summarize(Cases_today=last(Cases)) %>% ggplot(aes(x=Cases_today)) + geom_histogram() So over 20 zipcodes have no cases (but they may also have no people), and it looks like most zipcodes are in the 250-750 range.
      
      
    
  
      
        Covid tests in Texas The second entry to look at the testing in Texas. This time there is much more data to examine, but also a new and irritating problem. At some point, for some counties, antibody testing got mixed in with the PCR tests, so the numbers are not nearly as good as they should be. Last I read, about 10% of the tests are the wrong test, but I suspect this is not evenly distributed by county, but rather concentrated in a few.
      
      
    
  
      
        Introduction Towards the end of May the state of Texas suddenly began adding the number of COVID-19 cases detected amongst prison inmates to the county totals for the counties in which the prisons resided. However, they have not indicated if they did this change on a single day, or if it may have taken place over several days, for different prisons. In this bit of work, I will try to ferret out what they did as best I can, so that I may best correct my own data.
      
      
    
  