Geocoding Part 2 Let’s take the address for the Art Car Museum and use that as our example address. The first address is correct, the next 5 have a flaw in one of the fields. # Address for the Art Car Museum Test_data <- tribble( ~ID, ~Street_num, ~Prefix, ~Street_name, ~Street_type, ~Zipcode, "1", "140", "" , "Heights", "BLVD", "77007", "2", "138", "" , "Heights", "BLVD", "77007", "3", "140", "W" , "Heights", "BLVD", "77007", "4", "140", "" , "Hieghts", "BLVD", "77007", "5", "140", "" , "Heights", "LN", "77007", "6", "140", "" , "Heights", "BLVD", "77070" ) Exact Matches The basic expected way to run the code is to first find all exact matches, and then use the additional tools to try to repair any failures that occurred.
Geocoding Attaching a Lat-Long to a street address is not an easy task. I have tried a variety of freely available geocoders, and have found all of them to be lacking for various reasons. See one of my earliest posts on this blog for more details. Finally, I discovered that the city of Houston has made available a file from their GIS group that has most of the addresses and associated Lat-Longs for the city (a total of 1,480,215 records when I downloaded it).
Having bought solar panels myself a couple of years ago, and realizing that the city permit database could be used to find most installations, I decided that it would be interesting to look at the recent history and a few other facets of residential solar panel installations. The first step is to download the structural permit data as a CSV file from the city open data website.
Harris County Appraisal District data Let’s start exploring the data. We’ll look at all these exempt properties. # This takes us from 1.4 million to 74,000 records Dx <- df %>% filter(str_detect(state_class, "^X")) Dx %>% ggplot(aes(x=state_class)) + geom_histogram(stat="count")+ labs(x="Exempt code", y="Number of Properties", title="Number of properties in each exempt class") # Same plot but for total square miles Dx %>% group_by(state_class) %>% summarize(area=sum(land_ar, na.rm=TRUE)*3.58701e-8) %>% ggplot(aes(x=state_class)) + geom_col(aes(y=area))+ labs(x="Exempt code", y="Square Miles", title="Area of properties in each exempt class") # Same plot but for total Market Value Dx %>% group_by(state_class) %>% summarize(area=sum(tot_mkt_val, na.
Let’s take a look at the early voting data for Harris County Since I already have a bunch of data for Harris county precincts and zipcodes, why not make some use of it? Setup path <- "/home/ajackson/Dropbox/Rprojects/Voting/" BBM <- read_csv(paste0(path, "Cumulative_BBM_1120.csv"), col_types = "ccccccccccccccccccccccccccccccccccccccccc") BBM <- BBM %>% mutate(ActivityDate=mdy_hms(ActivityDate)) %>% mutate(ActivityDate=force_tz(ActivityDate, tzone = "US/Central")) %>% select(ElectionCode:ActivityDate) %>% mutate(Ballot_Type="Mail") EV <- list.files(path=path, pattern="Cumulative_EV_1120_1*", full.
Harris County COVID-19 data I have a very nice (I hope) dataset consisting of number of positive COVID-19 cases per day in Harris county by zipcode. In this blog entry I would like to study this dataset and look at comparisons with various other data. Initial look First off, let’s explore the data for issues, and for ideas about what might be interesting.
Miscellaneous analyses related to the Covid-19 pandemic After reading the paper this morning about a county nearby (Houston county) with zero reported cases, I got curious. What does the distribution of test coverage look like, i.e., number of tests per capita? And also, what is the rate of tests that come back positive? So let’s look at the data. We can now grab an excel spreadsheet from the state that gives number of tests per county.
- OLDER POSTS
- page 1 of 2