Geocoding Part 2 Let’s take the address for the Art Car Museum and use that as our example address. The first address is correct, the next 5 have a flaw in one of the fields. # Address for the Art Car Museum Test_data <- tribble( ~ID, ~Street_num, ~Prefix, ~Street_name, ~Street_type, ~Zipcode, "1", "140", "" , "Heights", "BLVD", "77007", "2", "138", "" , "Heights", "BLVD", "77007", "3", "140", "W" , "Heights", "BLVD", "77007", "4", "140", "" , "Hieghts", "BLVD", "77007", "5", "140", "" , "Heights", "LN", "77007", "6", "140", "" , "Heights", "BLVD", "77070" ) Exact Matches The basic expected way to run the code is to first find all exact matches, and then use the additional tools to try to repair any failures that occurred.

Continue reading

Geocoding Attaching a Lat-Long to a street address is not an easy task. I have tried a variety of freely available geocoders, and have found all of them to be lacking for various reasons. See one of my earliest posts on this blog for more details. Finally, I discovered that the city of Houston has made available a file from their GIS group that has most of the addresses and associated Lat-Longs for the city (a total of 1,480,215 records when I downloaded it).

Continue reading

Harris County Appraisal District data Let’s start exploring the data. We’ll look at all these exempt properties. # This takes us from 1.4 million to 74,000 records Dx <- df %>% filter(str_detect(state_class, "^X")) Dx %>% ggplot(aes(x=state_class)) + geom_histogram(stat="count")+ labs(x="Exempt code", y="Number of Properties", title="Number of properties in each exempt class") # Same plot but for total square miles Dx %>% group_by(state_class) %>% summarize(area=sum(land_ar, na.rm=TRUE)*3.58701e-8) %>% ggplot(aes(x=state_class)) + geom_col(aes(y=area))+ labs(x="Exempt code", y="Square Miles", title="Area of properties in each exempt class") # Same plot but for total Market Value Dx %>% group_by(state_class) %>% summarize(area=sum(tot_mkt_val, na.

Continue reading

We like to walk. When the weather cooperates, we can easily get in 5 or more miles in a day just walking around the neighborhood. We walk to the bank, to the grocery store, the hardware store, or just around the ’hood. There are two huge irritants on our walks. The terrible drivers who refuse to yield right-of-way to a pedestrian, and the abysmal quality of the sidewalks. This report will look at the sidewalks.

Continue reading

Introduction In late 2017 I did an analysis of crime data in my neighborhood (The Heights) using the online Houston Police Department data. This was so interesting that I foolishly decided to expand the effort to cover the whole city. After all, how hard could it be to go from analyzing one police beat with about 13,000 records, to analyzing 109 beats, with a corresponding increase in volume? This effort is still ongoing in fits and starts today, but I thought it would be useful to start documenting the journey now before the pain fades away.

Continue reading

Introduction I have been struggling with geocoding for about a year now, and have begun to learn far more than I wanted about the ugly details of the tools available for free. In particular I have been using Google and the US Census Bureau for geocoding. They each have their own strengths and weaknesses, so I thought it would be appropriate to share what I have learned. I would call Google promiscuous - they will try very hard to return a location to you, even if it is all wrong.

Continue reading

Author's picture

Alan Jackson

Retired Geophysicist, geophysical consultant, budding data scientist, trouble-maker.

Consultant and Chief Bottle Washer

Houston and Seattle