This is a bare bones tutorial on how to generate a Geopandas DataFrame from a SHP file in python. There are some good examples out there, but not really one that gave it the minimal approach I was looking for. I definitely learned a lot from the examples in Andrew Gaidus’ site; for a more in-depth approach (which includes a pretty nifty way to create a Geopandas DataFrame directly form a ZIP file) head over to his site!
Visualizing Census Tracts in Pennsylvania
I’m going to show how to create a Geopandas DataFrame from SHP files provided by the US Census Bureau, specifically their cartographic boundary shapefiles. There’s a wide variety of SHP files you can download here, such as regions, states and counties; definitely worth checking out, if you’re looking to get more experience working with geographic data in python!
Getting the shapefiles
Head on over to the Census Bureau site and download some boundary files of your choice. After downloading and unzipping the files, be sure to keep them all in the same directory! For those unfamiliar with *shp or cartographic files in general, the additional file types are required and contain additional data referenced by the *.shp file.
Import the Required Libraries
import shapefile import geopandas as gpd from shapely.geometry import shape import osr
For those unfamiliar, here’s a brief description of what each will be used for:
shapefile
: used to read in the *.shp filegeopandas
: the geographic version of pandas used to create the DataFrameshapely.geometry
: used for manipulating geometric elementsosr
: used for generating theproj4
information, which contains data such as the coordinate system
Reading in the Shapefile
Using the shapefile
library we can read *.shp files directly into the python environment.
tracts = shapefile.Reader('data/cb_2018_42_tract_500k.shp')
Use the fields
attribute to view the features contained in the shapefile.
tracts.fields
[('DeletionFlag', 'C', 1, 0), ['STATEFP', 'C', 2, 0], ['COUNTYFP', 'C', 3, 0], ['TRACTCE', 'C', 6, 0], ['AFFGEOID', 'C', 20, 0], ['GEOID', 'C', 11, 0], ['NAME', 'C', 100, 0], ['LSAD', 'C', 2, 0], ['ALAND', 'N', 14, 0], ['AWATER', 'N', 14, 0]]
Using python list comprehension, extract the fields into a list we can later join in a dictionary with the other attribute data
fields = [field[0] for field in tracts.fields[1:]]
Extract the attributes and geometry
First create some empty lists to hold the geometry and attributes extracted from the shapefile.
attributes = [] geometry = []
A simple for
loop will work for extracting the attribute and geometry data.
Note that we are appending a shapely
geometry object.
for row in tracts.shapeRecords(): geometry.append(shape(row.shape.__geo_interface__)) attributes.append(dict(zip(fields, row.record)))
Generate prj info using GDAL
Geopandas takes a specific file type known as proj4
to define its coordinate system. You can generate a proj4
file fairly easily using GDAL
.
with open('data/cb_2018_42_tract_500k.prj') as p: proj4 = osr.SpatialReference(p.read()).ExportToProj4() print(proj4)
+proj=longlat +datum=NAD83 +no_defs
Assembling the Geopandas DataFrame
Now that we have all the necessary components, it’s a simple matter to instantiate them as a Geopandas DaataFrame object.
gdf = gpd.GeoDataFrame(data=attributes, geometry=geometry, crs=proj4) gdf.head()
AFFGEOID | ALAND | AWATER | COUNTYFP | GEOID | LSAD | NAME | STATEFP | TRACTCE | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1400000US42019910500 | 1425881 | 0 | 019 | 42019910500 | CT | 9105 | 42 | 910500 | POLYGON ((-80.04897 41.06461, -80.04753 41.069… |
1 | 1400000US42019912200 | 4381377 | 0 | 019 | 42019912200 | CT | 9122 | 42 | 912200 | POLYGON ((-80.14905 40.68102, -80.13932 40.681… |
2 | 1400000US42021000100 | 581005 | 40611 | 021 | 42021000100 | CT | 1 | 42 | 000100 | POLYGON ((-78.92583 40.32872, -78.92481 40.331… |
3 | 1400000US42021012600 | 4290021 | 84268 | 021 | 42021012600 | CT | 126 | 42 | 012600 | POLYGON ((-78.73584 40.47753, -78.73509 40.481… |
4 | 1400000US42025020700 | 43780635 | 1425028 | 025 | 42025020700 | CT | 207 | 42 | 020700 | POLYGON ((-75.71378 40.85813, -75.69006 40.869… |
Plot the gpd
Geometry
It’s fairly easy to make quick plots of a DataFrame in Geopandas; appending the plot()
method returns a motplotlib rendering. The plot()
method also takes arguments typical to the same Matplotlib function.
gdf.plot(figsize=(16,8), cmap='Blues', edgecolor='black', linewidth=0.4)

There it is, the quick and (very) simple way to get SHP data into a Geopandas DataFrame. You can find more Tutorials here, and the Gist for the all the code below. If you found this tutorial helpful, do something nice for your neighbor, and course please share and comment!