Geojson may be catching up, but by and far the most popular data format for sharing geospatial data is (and has been since the early 1990’s) the shapefile. Although a shapefile is a simple way to share data that can be easily consumed by almost any GIS system, it comes with many limitations and is not suitable for your enterprise data needs.

My personal least favorite limitation is that shapefiles truncate column names greater than 10 characters – ouch!

Luckily, the PostGIS community has bundled with PostGIS distributions a great tool that simplifies the task of importing shapefiles into your PostgreSQL database: shp2pgsql.

Shapefile Definitions:

Vector Data: Points, lines, or polygons that are used to represent real-world features on a map. Used for data that has discrete boundaries like administrative boundaries (e.g. country borders), physical features (e.g. lakes or streams) or infrastructure (e.g. streets). Attributes in a table are associated with each vector feature. vector_small

Shapefile: The most popular geospatial vector data format. First developed by ESRI. Because of it’s open specification it is now predominantly used to share GIS data between open source and proprietary software. A shapefile actually consists of 3 mandatory files:

  • .shp — shape format; the feature geometry itself
  • .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
  • .dbf — attribute format; columnar attributes for each shape, in dBase IV format

Exercise: Shapefiles to PostGIS with shp2pgsql

Prerequisite: Before you begin, create a spatial database by following the instructions laid out in our previous post, PostGIS – How to create a Spatial Database with PGC Command Line.

Bundled with your BigSQL PostGIS distribution is a tool called shp2pgsql. In this exercise, we will download a shapefile, find the correct SRID code, and use the command line tool to upload it to our postgis_sample database.

Download the data

Over the past 10 years, most U.S. municipalities have enacted open data policies. This means you can download lots of geospatial data from government web sites. Most geospatial data is provided as a shapefile or geojson file.

For this exercise, I downloaded NYC borough boundaries shapefile available at the New York City Open Data site.

To download this data click here: nybb_17a

After downloading, unzip the files and put it on your drive somewhere you will remember.

Navigate to shp2pgsql

Before you begin, make sure shp2pgsql is installed in your bin directory:

cd <directory of installation>/pg96/bin 

shp2pgsql syntax

Syntax for running command line:

shp2pgsql -I -s <SRID> <PATH/TO/SHAPEFILE> <SCHEMA>.<DBTABLE> | psql -U postgres -d <DBNAME>

Parameters

< SRID > – The Spatial Reference System Identifier (SRID or ESPG) is a unique value that represents the data’s coordinate system.

Findng the SRID/ESPG code:

All geospatial data has a coordinate system it is projected in. The topic of projections is beyond the scope of this posting, but you can learn more here.

Navigate to the shapefile’s .prj file which contains information about the shapefile’s coordinate system.

  cd <location of unzipped nybb_17a directory> 

Option 1:
Upload the .prj file to prj2epsg.org

Option 2:
Open the .prj file in your favorite editor and put the values in the first parameter, PROJCS, into the following sql statement that you will run against the postgis_example database:

nybb.prj parameter:

  PROJCS["NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet"

SQL query:

    SELECT srid, srtext,proj4text FROM spatial_ref_sys 
    WHERE srtext ILIKE '%New_York_Long_Island%' 
    AND proj4text ILIKE '%nad83%' AND proj4text ILIKE '%ft%';  

srid

The SRID for the nybb data is 2263.

<PATH/TO/SHAPEFILE> – The full path to the shapefile:

<location of unzipped nybb_17a directory>  

< SCHEMA > — Target schema where the new table will be created:

This is optional. 

< DBTABLE > — New database table to be created:

nybb

< DATABASE > —Target database where the table will be created:

postgis_sample

For a complete listing of the shp2pgslq command line parameters go here.

Putting it all together

So, putting this all together for this exercise you would run the following command:

Note: before running the command make sure psql has been started.

cd <location of unzipped nybb_17a directory>
shp2pgsql -I -s 2263 nybb.shp nybb | psql -U postgres -d postgis_sample

Now, run the following sql query on your postgis_sample database:

select * from public.nybb;

And you will see the nybb table’s attributes:

nybb_table