A visualization of census tracts in PA using geopandas

This is a bare bones tutorial on how to generate a Geopandas DataFrame from a SHP file in python. There are some good examples out there, but not really one that gave it the minimal approach I was looking for. I definitely learned a lot from the examples in Andrew Gaidus’ site; for a more in-depth approach (which includes a pretty nifty way to create a Geopandas DataFrame directly form a ZIP file) head over to his site!

Visualizing Census Tracts in Pennsylvania

I’m going to show how to create a Geopandas DataFrame from SHP files provided by the US Census Bureau, specifically their cartographic boundary shapefiles. There’s a wide variety of SHP files you can download here, such as regions, states and counties; definitely worth checking out, if you’re looking to get more experience working with geographic data in python!

Getting the shapefiles

Head on over to the Census Bureau site and download some boundary files of your choice. After downloading and unzipping the files, be sure to keep them all in the same directory! For those unfamiliar with *shp or cartographic files in general, the additional file types are required and contain additional data referenced by the *.shp file.

Import the Required Libraries

import shapefile
import geopandas as gpd
from shapely.geometry import shape
import osr

For those unfamiliar, here’s a brief description of what each will be used for:

  • shapefile: used to read in the *.shp file
  • geopandas: the geographic version of pandas used to create the DataFrame
  • shapely.geometry: used for manipulating geometric elements
  • osr: used for generating the proj4 information, which contains data such as the coordinate system

Reading in the Shapefile

Using the shapefile library we can read *.shp files directly into the python environment.

tracts = shapefile.Reader('data/cb_2018_42_tract_500k.shp')

Use the fields attribute to view the features contained in the shapefile.

tracts.fields
[('DeletionFlag', 'C', 1, 0),
 ['STATEFP', 'C', 2, 0],
 ['COUNTYFP', 'C', 3, 0],
 ['TRACTCE', 'C', 6, 0],
 ['AFFGEOID', 'C', 20, 0],
 ['GEOID', 'C', 11, 0],
 ['NAME', 'C', 100, 0],
 ['LSAD', 'C', 2, 0],
 ['ALAND', 'N', 14, 0],
 ['AWATER', 'N', 14, 0]]

Using python list comprehension, extract the fields into a list we can later join in a dictionary with the other attribute data

fields = [field[0] for field in tracts.fields[1:]]

Extract the attributes and geometry

First create some empty lists to hold the geometry and attributes extracted from the shapefile.

attributes = []
geometry = []

A simple for loop will work for extracting the attribute and geometry data.

Note that we are appending a shapely geometry object.

for row in tracts.shapeRecords():
    geometry.append(shape(row.shape.__geo_interface__))
    attributes.append(dict(zip(fields, row.record)))

Generate prj info using GDAL

Geopandas takes a specific file type known as proj4 to define its coordinate system. You can generate a proj4 file fairly easily using GDAL.

with open('data/cb_2018_42_tract_500k.prj') as p:
    proj4 = osr.SpatialReference(p.read()).ExportToProj4()
print(proj4)
+proj=longlat +datum=NAD83 +no_defs

Assembling the Geopandas DataFrame

Now that we have all the necessary components, it’s a simple matter to instantiate them as a Geopandas DaataFrame object.

gdf = gpd.GeoDataFrame(data=attributes, geometry=geometry, crs=proj4)
gdf.head()
AFFGEOIDALANDAWATERCOUNTYFPGEOIDLSADNAMESTATEFPTRACTCEgeometry
01400000US420199105001425881001942019910500CT910542910500POLYGON ((-80.04897 41.06461, -80.04753 41.069…
11400000US420199122004381377001942019912200CT912242912200POLYGON ((-80.14905 40.68102, -80.13932 40.681…
21400000US420210001005810054061102142021000100CT142000100POLYGON ((-78.92583 40.32872, -78.92481 40.331…
31400000US4202101260042900218426802142021012600CT12642012600POLYGON ((-78.73584 40.47753, -78.73509 40.481…
41400000US4202502070043780635142502802542025020700CT20742020700POLYGON ((-75.71378 40.85813, -75.69006 40.869…

Plot the gpd Geometry

It’s fairly easy to make quick plots of a DataFrame in Geopandas; appending the plot() method returns a motplotlib rendering. The plot() method also takes arguments typical to the same Matplotlib function.

gdf.plot(figsize=(16,8), cmap='Blues', edgecolor='black', linewidth=0.4)
A visualization of a geopandas from an SHP file

There it is, the quick and (very) simple way to get SHP data into a Geopandas DataFrame. You can find more Tutorials here, and the Gist for the all the code below. If you found this tutorial helpful, do something nice for your neighbor, and course please share and comment!