A data project So I have a job that runs every hour and scrapes the HFD incident page so that I can capture all the HFD 911 calls. The job has been running since May 6, 2022. To be honest, I haven’t seen any real obvious patterns, at least geographically. Part of the problem is the resolution. The data is located on the Keymap grid, which are 1x1 mile squares covering the whole area.
Notes on how to make contour maps in R As a retired Geophysicist, I spent a career making contour maps. I have found it to be challenging to make good contour maps in R, and so as part of my own learning process, I have documented the necessary steps in hopes that this may help others involved in the same struggles.
Ultimately I want to create filled contours that represent a surface, based on some random collection of input points, and display those contours on top of a detailed basemap, such as OpenStreetMap.
Covid and Politics Let’s take a look at Texas politics and Covid deaths. The CDC Wonder data has preliminary deaths and preliminary covid deaths, by county through April of 2022. We can combine this data with the votes from the presidential race and look for correlations.
But first, I will calculate the “excess” deaths using a bog simple approach, I will simply assume that for each county the deaths in years 2018 and 2019 represent a somewhat constant background, and any increase in the years 2020 and 2021 will be considered excess deaths likely due to Covid.
Geocoding Part 2 Let’s take the address for the Art Car Museum and use that as our example address. The first address is correct, the next 5 have a flaw in one of the fields.
# Address for the Art Car Museum Test_data <- tribble( ~ID, ~Street_num, ~Prefix, ~Street_name, ~Street_type, ~Zipcode, "1", "140", "" , "Heights", "BLVD", "77007", "2", "138", "" , "Heights", "BLVD", "77007", "3", "140", "W" , "Heights", "BLVD", "77007", "4", "140", "" , "Hieghts", "BLVD", "77007", "5", "140", "" , "Heights", "LN", "77007", "6", "140", "" , "Heights", "BLVD", "77070" ) Exact Matches The basic expected way to run the code is to first find all exact matches, and then use the additional tools to try to repair any failures that occurred.
Geocoding Attaching a Lat-Long to a street address is not an easy task. I have tried a variety of freely available geocoders, and have found all of them to be lacking for various reasons. See one of my earliest posts on this blog for more details.
Finally, I discovered that the city of Houston has made available a file from their GIS group that has most of the addresses and associated Lat-Longs for the city (a total of 1,480,215 records when I downloaded it).
Having bought solar panels myself a couple of years ago, and realizing that the city permit database could be used to find most installations, I decided that it would be interesting to look at the recent history and a few other facets of residential solar panel installations.
The first step is to download the structural permit data as a CSV file from the city open data website.. This file is no longer available, so I now download the data from the new site and clean it up.
Harris County Appraisal District data Let’s start exploring the data. We’ll look at all these exempt properties.
# This takes us from 1.4 million to 74,000 records Dx <- df %>% filter(str_detect(state_class, "^X")) Dx %>% ggplot(aes(x=state_class)) + geom_histogram(stat="count")+ labs(x="Exempt code", y="Number of Properties", title="Number of properties in each exempt class") # Same plot but for total square miles Dx %>% group_by(state_class) %>% summarize(area=sum(land_ar, na.rm=TRUE)*3.58701e-8) %>% ggplot(aes(x=state_class)) + geom_col(aes(y=area))+ labs(x="Exempt code", y="Square Miles", title="Area of properties in each exempt class") # Same plot but for total Market Value Dx %>% group_by(state_class) %>% summarize(area=sum(tot_mkt_val, na.