Harris County COVID-19 data I have a very nice (I hope) dataset consisting of number of positive COVID-19 cases per day in Harris county by zipcode. In this blog entry I would like to study this dataset and look at comparisons with various other data.
Initial look First off, let’s explore the data for issues, and for ideas about what might be interesting.
# How is the data distributed? Let's look at the most recent day Harris %>% group_by(Zip) %>% summarize(Cases_today=last(Cases)) %>% ggplot(aes(x=Cases_today)) + geom_histogram() So over 20 zipcodes have no cases (but they may also have no people), and it looks like most zipcodes are in the 250-750 range.
Miscellaneous analyses related to the Covid-19 pandemic After reading the paper this morning about a county nearby (Houston county) with zero reported cases, I got curious. What does the distribution of test coverage look like, i.e., number of tests per capita? And also, what is the rate of tests that come back positive?
So let’s look at the data.
We can now grab an excel spreadsheet from the state that gives number of tests per county.
The city of Houston makes a file available on the web every week containing a summary of the past week’s building permits. I found this file a bit difficult to digest - it needed a map, it needed search and filtering. So I wrote some code to automatically read the file in each week, merge it with the previous weeks files, and then upload that to the web where I have an application to display it.
We like to walk. When the weather cooperates, we can easily get in 5 or more miles in a day just walking around the neighborhood. We walk to the bank, to the grocery store, the hardware store, or just around the ’hood. There are two huge irritants on our walks. The terrible drivers who refuse to yield right-of-way to a pedestrian, and the abysmal quality of the sidewalks. This report will look at the sidewalks.
Having bought solar panels myself a couple of years ago, and realizing that the city permit database could be used to find most installations, I decided that it would be interesting to look at the recent history and a few other facets of residential solar panel installations.
The first step is to download the structural permit data as a CSV file from the city open data website.
Grabbing the correct records As far as I can tell, Solar Panels are designated as such in the Description field, and nothing else.
Introduction TXDoT has available, online, detailed data regarding traffic collisions throughout the state. The data itself must be queried and downloaded manually as CSV files, but that is not too bad. I downloaded the data for Harris county from 2010 to 2018.
Database is documented at https://www.txdot.gov/inside-txdot/division/traffic/data-access.html
Access is from https://cris.txdot.gov/secure/Share
Log on and download one year at a time. The zip files will require the login password to open them.
Introduction Houston is one of the worst places in the country for allergies. Since there is reasonably good data available, I thought I should analyze the pollen and mold data with an eye towards prediction - both short and mid range time scales.
As with any project like this, step one is reading in and cleaning up the raw data.
The data is available online as artisanal spreadsheets at https://www.