Covid and Politics Let’s take a look at Texas politics and Covid deaths. The CDC Wonder data has preliminary deaths and preliminary covid deaths, by county through April of 2022. We can combine this data with the votes from the presidential race and look for correlations. But first, I will calculate the “excess” deaths using a bog simple approach, I will simply assume that for each county the deaths in years 2018 and 2019 represent a somewhat constant background, and any increase in the years 2020 and 2021 will be considered excess deaths likely due to Covid.

Continue reading

Geocoding Part 2 Let’s take the address for the Art Car Museum and use that as our example address. The first address is correct, the next 5 have a flaw in one of the fields. # Address for the Art Car Museum Test_data <- tribble( ~ID, ~Street_num, ~Prefix, ~Street_name, ~Street_type, ~Zipcode, "1", "140", "" , "Heights", "BLVD", "77007", "2", "138", "" , "Heights", "BLVD", "77007", "3", "140", "W" , "Heights", "BLVD", "77007", "4", "140", "" , "Hieghts", "BLVD", "77007", "5", "140", "" , "Heights", "LN", "77007", "6", "140", "" , "Heights", "BLVD", "77070" ) Exact Matches The basic expected way to run the code is to first find all exact matches, and then use the additional tools to try to repair any failures that occurred.

Continue reading

Geocoding Attaching a Lat-Long to a street address is not an easy task. I have tried a variety of freely available geocoders, and have found all of them to be lacking for various reasons. See one of my earliest posts on this blog for more details. Finally, I discovered that the city of Houston has made available a file from their GIS group that has most of the addresses and associated Lat-Longs for the city (a total of 1,480,215 records when I downloaded it).

Continue reading

Having bought solar panels myself a couple of years ago, and realizing that the city permit database could be used to find most installations, I decided that it would be interesting to look at the recent history and a few other facets of residential solar panel installations. The first step is to download the structural permit data as a CSV file from the city open data website.

Continue reading

Harris County Appraisal District data Let’s start exploring the data. We’ll look at all these exempt properties. # This takes us from 1.4 million to 74,000 records Dx <- df %>% filter(str_detect(state_class, "^X")) Dx %>% ggplot(aes(x=state_class)) + geom_histogram(stat="count")+ labs(x="Exempt code", y="Number of Properties", title="Number of properties in each exempt class") # Same plot but for total square miles Dx %>% group_by(state_class) %>% summarize(area=sum(land_ar, na.rm=TRUE)*3.58701e-8) %>% ggplot(aes(x=state_class)) + geom_col(aes(y=area))+ labs(x="Exempt code", y="Square Miles", title="Area of properties in each exempt class") # Same plot but for total Market Value Dx %>% group_by(state_class) %>% summarize(area=sum(tot_mkt_val, na.

Continue reading

Vaccine Reluctance Let’s look at Texas counties and test various factors for correlations to the vaccination rate. We’ll primarily look at the rate of the first vaccination, since there are a variety of reasons why someone might not get the second dose. Let’s start with the raw rates of vaccination by county. Vaccine %>% mutate(Pct_one_dose=People_one_dose/Pop_total) %>% ggplot(aes(x=Date, y=Pct_one_dose, color=County)) + geom_line(show.legend = FALSE) + labs(x="Date", y="First Dose Percentage", title="Texas counties Vaccine Progress") Hmmm… let’s do a little cleanup.

Continue reading

December 2020 vaccine distribution and administration began. I started trapping the daily spreadsheet from the state health department that tracked progress. This blog entry is really for prototyping some of the data cleanup and displays that I will incorporate into my shiny app. Let’s take a look at data issues. df %>% filter(!is.na(Pct_given)) %>% ggplot(aes(x=Pct_given)) + geom_histogram()+ labs(x="Percent Distributed Administered", title="Distribution of Administered Vaccine") ## `stat_bin()` using `bins = 30`.

Continue reading

Author's picture

Alan Jackson

Retired Geophysicist, geophysical consultant, budding data scientist, trouble-maker.

Consultant and Chief Bottle Washer

Houston and Seattle