GeospatialPython.com: python

Showing posts with label python. Show all posts

Monday, December 15, 2014

GPX2SHP

Here's a nice Garmin GPX to shapefile converter using PyShp by Matt Rantala from his blog MapRantala.com:

http://www.maprantala.com/2014/05/16/garmin-gpx-to-shapefile-shp-conversion-gpx2shp-py/

Photo: FishStickTheatre.com

Sunday, June 23, 2013

PyShp Version 1.1.7 Release

PyShp 1.1.7 is out after several months of virtually no updates. This release fixes a bunch of minor issues

plus a couple of important features. You can get it through setuptools or source from the CheeseShop: https://pypi.python.org/pypi/pyshp/1.1.7. The Google Code page is here:https://code.google.com/p/pyshp/

And as usual there are no dependencies other than Python itself. Updates include:

Added Python geo_interface convention to export shapefiles as GeoJSON.

Used is_string() method to detect file names passed as unicode strings (failed on unicode strings before).

Added Reader.iterShapes() method to iterate through geometry records for parsing large files efficiently.

Added Reader.iterRecords() method to iterate through dbf records efficiently in large files.

Modified shape() method to use iterShapes() if shx file is not available as well as record() method.

Fixed bug which prevents writing the number 0 to dbf fields.

Updated shape() method to calculate and seek the start of the next record. The shapefile spec does not require the content of a geometry record to be as long as the content length defined in the header. The result is you can delete features without modifying the record header allowing for empty space in records.

Added enforcement of closed polygons in the Writer.poly() method.

Added unique file name generator to use if no file names are passed to a writer instance when saving (ex. w.save()). The unique file name is returned as a string.

Updated "bbox" property documentation to match Esri specification.

The __geo_interface__ update required a polygon area calculator. This method is undocumented but you can feed a list of points representing a polygon to shapefile.signed_area(coords) and get an area calculation back. If the area is a positive number the points are clockwise (outer ring). If the area is negative then the points are in counter-clockwise order (i.e. an inner polygon ring).

Monday, May 20, 2013

New __geo_interface__ for PyShp

Christian Ledermann took the initiative to fork pyshp and add the __geo_interface__ convention.

http://twitter.com/GeoJSON

The __geo_interface__ is a community standard riding the current "less is more" entropy wave to get away from heavy data exchange standards, make software compatible, and get some work done.

This standard is very pythonic and well thought out which is no surprise because Sean Gillies and Howard Butler are a driving forces behind it. The goal is to make moving data around among libraries with different specialties, like Shapely and PySAL, easier. It is closely tied to GeoJSON which is getting a lot of traction and shaking up the industry and community.

Christian's __geo_interface__ implementation for PyShp is here:

https://github.com/cleder/pyshp

He also wrote some ogr2ogr-style conversion samples to show you how to use it here:
https://github.com/cleder/geo_file_conv

I'm 100% behind these ideas and will roll this interface into the main trunk. But there's nothing stopping you from using Christian's fork today.

Enjoy!

Tuesday, April 9, 2013

Add a Field to an Existing Shapefile

The dbf file of a shapefile is a simple file-based database with rows and columns. The rows are

Adding a field where there wasn't one before has
limitless possibilities.

"records" and the columns are "fields". Sometimes you want to add an additional field to the dbf file to capture some new type of information not originally included.

Today's example shows you how to use pyshp to add a new field to an existing shapefile. This operation is a two-step process. You must first update the dbf header to define the new field. Then you must update each record to account for a new column in the database so everything is balanced.

In the past, I've demonstrated modifying existing shapefiles for other reasons including merging shapefiles and deleting features in shapefiles. In every case you are actually reading in the existing shapefile, creating a new shapefile in memory and then writing out the new file either separately or on top of the old one. Even in really high-end GIS packages that's basically all you're doing. Some packages will use a temporary file in between.

Here's the example. We'll create a counter that gives us unique sample data to append to each record just so we can see the changes clearly. In the real world, you'd probably just insert a blank palce holder.

import shapefile

# Read in our existing shapefile
r = shapefile.Reader("Mississippi")

# Create a new shapefile in memory
w = shapefile.Writer()

# Copy over the existing fields
w.fields = list(r.fields)

# Add our new field using the pyshp API
w.field("KINSELLA", "C", "40")

# We'll create a counter in this example
# to give us sample data to add to the records
# so we know the field is working correctly.
i=1

# Loop through each record, add a column.  We'll
# insert our sample data but you could also just
# insert a blank string or NULL DATA number
# as a place holder
for rec in r.records():
 rec.append(i)
 i+=1
 # Add the modified record to the new shapefile 
 w.records.append(rec)

# Copy over the geometry without any changes
w._shapes.extend(r.shapes())

# Save as a new shapefile (or write over the old one)
w.save("Miss")

So there you have it. Overall it's a pretty simple process that can be extended to do some sophisticated operations. The sample Mississippi shapefile can be found here. But this shapefile only has one record so it's not that interesting. But it's lightweight and easy to examine the dbf file in your favorite spreadsheet program.

Tuesday, May 29, 2012

SBN Mystery - Solved!

Last October I asked you all for help in figuring out the spatial indexing algorithm used to create Esri sbn files. Using Pyshp, I had successfully decoded the file formats which I provided. However I could not figure out the algorithm used to create and populate the spatial bins within these files.

Today I'm pleased to announce this challenge has been answered. The GIS community now has access to both the sbn and sbx file format as well as the algorithm for grouping features in a shapefile into "spatial bins".

I'm glad I asked for help as this challenge turned out to be quite difficult. The brain behind this operation is Marc Pfister with some good insights from Si Parker. Marc worked tirelessly on this problem for months with a cross-country move and complete career change thrown in to make it interesting. Marc did all the heavy intellectual lifting with me playing an inquisitive, but usually short-sighted Watson to his Holmes. I generated endless series of shapefiles and one-off scripts to help Marc try and flush out the algorithm based only on subtle changes in the number of bins and features they contained as well as his past experience with spatial indexing.

When I figured out the file formats I had hoped I was just a Wikipedia search away from recognizing the spatial tree algorithm. But the solution turned out to be much more complex than that. Esri uses a sort of balanced tree that exhibits traits of several different algorithms. The system seems carefully designed but is by no means obvious. I will publish Marc's findings as soon as I can.

There are still a few shapefile cases which create puzzling but insignificant results. However we are at the 98% mark. The project goal of compatibility has been reached. There is no longer any reason to hold off on sharing the results. We are fairly certain that we are able to create sbn and sbx files which sufficiently fool ArcMap as well as other Esri packages so other software can read, use, and generate these indexes alongside the Esri suite. There is more testing to do but it seems we are out of the woods.

What we haven't done is nicely packaged all of this work up. But Marc posted a small set of Python scripts on github which demonstrate the algorithm and file handling needed to copy this capability. Over the coming months I will fold this code into Pyshp, produce better documentation on the algorithm, and provide posts on how to deal with these indexes. But for now here's what you've been waiting for:

https://github.com/drwelby/hasbeen

By the way, Marc does freelance programming. In my job, I get the opportunity to work with lots of really bright geospatial programmers and mathematicians and this guy is one of the very best I've ever seen. If you have a tough geospatial project and need some E=MC2 smarts definitely look him up.

Monday, May 21, 2012

Advanced Shapefile Merger

Italian GIS blogger Toni sent me a message about a sophisticated OGR-based shapefile merger utility he created. Last year I posted a simple pyshp example that would find all ".shp" files in a directory and merge the geometry and attributes into a single shapefile. Toni's version takes this concept much further to include wildcards, recursive directory walking, exclusion lists, and some dbf tricks. You can find this utility at Toni's blog "Furious GIS":

http://furiousgis.blogspot.it/2012/05/python-shapefile-merger-utility.html

Tuesday, May 1, 2012

Pyshp 1.1.6-beta Release for Testing

Pyshp 1.1.6-beta ready for testing

A pre-release of pyshp 1.1.6 is available for testing. This release addresses some major issues with reading/writing 3D shapefiles. The issue was identified by John Burky. I am currently working through several bug reports right now but this one was a show stopper so I wanted to get a fix out quickly as there seem to be several people working with z elevation values right now. Also if you were having troubles with "m" measure values this release fixes a related issue. The Editor class, z values, and m values are dark corners that are not as well tested as "regular" shapefile features so if you're working with these types of data keep a sharp eye out for anything weird. I'll push this update out as an official release within the next couple of weeks if there are no complaints.

The release is available in the Pyshp Google Code site "Downloads" section here.

In other news we are still working on the sbn/sbx binning example for spatial indexes. Very close but not there yet.

Friday, November 4, 2011

Deleting Shapefile Features

Sometimes, usually as a server-based operation, you need to delete all of the features in a shapefile. All you want left is the shapefile type, the dbf schema, and maybe the overall bounding box. This shapefile stub can then be updated by other processes. Pyshp currently doesn't have an explicit "delete" method. But because pyshp converts everything to native Python data types (strings, lists, dicts, numbers) you can usually manipulate things fairly easily. The solution is very similar to merging shapefiles but instead you are writing back to the same file instead of a new copy. There's only one hitch in this operation resulting from a minor difference in the pyshp Reader and Writer objects. In the reader the "bbox" property returns a static array of [xmin, ymin, xmax, ymax]. The Writer also has a "bbox" property but it is a method that is called when you save the shapefile. The Writer calculates the bounding box on the fly by reading all of the shapes just before saving. But in this case there are no shapes so the method would throw an error. So what we do is just override that method with a lambda function to return whatever bbox we want whether it be the original bbox or a made up one.

import shapefile 
# Read the shapefile we want to clear out
r = shapefile.Reader("myshape") 
# Create a Writer of the same type to save out as a blank
w = shapefile.Writer(r.shapeType) 
# This line will give us the same dbf schema 
w.fields = r.fields 
# Use the original bounding box in the header 
w.bbox = lambda: r.bbox 
# Save the featureless, attribute-less shapefile
w.save("myshape")

Instead of using the original bounding box we could just populate it with 0's and let the update process repopulate it:

w.bbox = lambda: [0.0, 0.0, 0.0, 0.0]

Note coordinates in a shapefile must always be floating-point numbers. Sometimes you may not want to delete all of the features. You may want to select certain features by attribute or using a spatial operation.

Wednesday, November 2, 2011

Generating Shapefile shx Files

Shapefile shx files help software locate records
quickly but they are not strictly necessary. The
shapefile software can manually browse the
records to answer a query.

Lately I've been following traffic and responding to posts on the excellent site GIS StackExchange. There are several questions about shapefile shx files which also point to even more questions in the ESRI forums on this topic.

If for some reason, you end up with a shapefile that is missing the shx file then most software is going to complain and refuse to deal with it. The shapefile spec requires, at a minimum, that you have an shp, shx, and dbf file to have a complete file. However this requirement is not a technical requirement and a lot of people seem to be confused about that.

The shx file is a trivial index file that provides fixed-length records pointing to the byte offsets of records in the shp file only. It does not connect the shp file and dbf file in any way nor does it contain any sort of record number. There are no record numbers stored in any of the three standard files which is often a point of confusion. The software reading a shapefile has to count the number of records read to determine the record id (geometry and attributes). If you wrote a program to randomly select a record from a shapefile there is no way to tell what the record number is by the record contents.

The purpose of the shx file is to provide faster access to a particular record in a shapefile without storing the entire record set of the shp and dbf files in memory. The header of the shx file is 100 bytes long. Each record is 8 bytes long. So if I want to access record 3, I know that 2*8 = 16 and I can jump to byte 100+16=116 in the shx file, read the 8-byte record to get the offset and record length within the shp file, and then jump straight to that location in the shp file.

While the shx file is convienient it isn't necessary. Most software balks if it is not there though. However pyshp handles it gracefully. If the shx index is there it is used for record access, if not then pyshp reads through the shp records into memory and handles the records as a python list.

Sometimes shx files become corrputed or go missing. You can build a new shx index using pyshp. It's kind of a hack but still very simple. In the following example we build an index file for a point shapefile named "myshape" that has two files: "myshape.shp" and "myshape.dbf"

# Build a new shx index file
import shapefile
# Explicitly name the shp and dbf file objects
# so pyshp ignores the missing/corrupt shx
myshp = open("myshape.shp", "rb")
mydbf = open("myshape.dbf", "rb")
r = shapefile.Reader(shp=myshp, shx=None, dbf=mydbf)
w = shapefile.Writer(r.shapeType)
# Copy everything from reader object to writer object
w._shapes = r.shapes()
w.records = r.records()
w.fields = list(r.fields)
# saving will generate the shx
w.save("myshape")

If the shx file is missing it will be created. If it's corrupt it will be overwritten. So the moral of the story is because shapefiles consist of multiple files, it is actually a robust format. The data in the individual files can usually be accessed in isolation from the other files despite what the standard requires - assuming the software you're using is willing to cooperate.

Sunday, October 2, 2011

Pyshp Compatibility

Thanks to some outstanding work by a contributor, pyshp is now compatible with Python 2.4 to 3.x. Before I was maintaining a separate code base for Python 3 which was falling behind. Now everything is merged in the subversion trunk and you can use pyshp 1.1.4 or higher with either major version.

Monday, September 26, 2011

Reading Shapefiles from the Cloud

In a previous post, I wrote about saving shapefiles using pyshp to file-like objects and demonstrated how to save a shapefile to a zip file. PyShp has the ability to read from Python file-like objects including zip files as well (as of version 1.1.2). Both the Reader object and the Writer.save() method accept keyword arguments which can be file-like objects allowing you to read and write shapefiles without any disk activity.

In this post, we'll read a shapefile directly from a zip file on a server all in memory.

Normally to read a shapefile from the file system you just pass in the name of the file to the Reader object as a string:

import shapefile
r = shapefile.Reader("myshapefile")

But if you use the keywords shp, shx, and dbf, then you can specify file-like objects. This example will demonstrate reading a shapefile - from a zip file - on a website.

import urllib2
import zipfile
import StringIO
import shapefile

cloudshape = urllib2.urlopen("http://pyshp.googlecode.com/files/GIS_CensusTract.zip")
memoryshape = StringIO.StringIO(cloudshape.read())
zipshape = zipfile.ZipFile(memoryshape)
shpname, shxname, dbfname, prjname = zipshape.namelist()
cloudshp = StringIO.StringIO(zipshape.read(shpname))
cloudshx = StringIO.StringIO(zipshape.read(shxname))
clouddbf = StringIO.StringIO(zipshape.read(dbfname))
r = shapefile.Reader(shp=cloudshp, shx=cloudshx, dbf=clouddbf)
r.bbox
[-89.8744162216216, 30.161122135135138, -89.1383837783784, 30.661213864864862]

You may specify only one of the three file types if you are just trying to read one of the file types. Some attributes such as Reader.shapeName will not be available using this method.

File-like objects provide a lot of power. However it is important to note that not all file-like objects implement all of the file methods. In the above example the urllib2 module does not provide the "seek" method needed by the zipfile module. The ZipFile read() method is the same way. To get around that issue, we transfer the data to the StringIO or cStringIO module in memory to ensure compatibility. If the data is potentially too big to hold in memory you can use the tempfile module to temporarily store the shapefile data on disk.

Friday, September 2, 2011

Pyshp now available via setuptools

The Python Shapefile Library (a.k.a pyshp) is now properly registered on the Python Package Index so it can be installed via setuptools and, possibly more importantly, registered as a dependency by other packages. At this time the "usage.py" doctests also download with it but not the sample blockgroups shapefile. I will eventually rewrite the doctests to create a sample shapefile used to verify the reader class. It is available as both an egg and source distribution.

I'd like to thank users "memedough" and Sebastian for pushing me to go ahead and set this up because yes I'm that lazy where after 2 years I still hadn't bothered to spend the 15 minutes to do it.

If you have setuptools installed, just run: easy_install pyshp

The setup.py script is also now in the subversion respoistory on google code just so I don't lose track of it. The PyPi page can be found here:

http://pypi.python.org/pypi/pyshp/

Thursday, May 5, 2011

New Member added to the PyShp Team

Great news - Nicholas Lederer who works with the University of Washington Applied Physics Laboratory recently joined the committers for PyShp. He has already made some significant contributions (i.e. bug squashing) to the library in his spare time. He also has bigger design ideas on how to address some issues to make code maintenance easier as contributors grow. We have also been dealing with some niche but important issues elevation values, dbf header fields, and all-too-common incorrect shapefiles produced by other libraries.

I originally wrote this library because I wanted an easy way to read and write shapefiles in pure Python. That option simply didn't exist from the time I entered the geospatial field in 2000 until 2004 when I started developing this library in my free time. Hurricane Katrina slowed down the development tremendously for a few years but by 2010 I had finished it.

In those 10 years since I first started looking for a library like PyShp, Python grew tremendously in popularity but nobody developed this kind of library. The field of geospatial technology also grew and the technology and standards changed radically but shapefiles still seem to stubbornly persist as the most common way to store and exchange geospatial vector data.

Given the library was still potentially relevant I guessed there were at least 4 other people in the world who had a need to write shapefiles in Python as simply as possible. It turns out there are hundreds, maybe thousands of people who have this need. And there is no doubt Python itself is firmly entrenched in geospatial technology in general.

I'm looking forward to this library reaching its full potential through a collaborative effort. Thanks to everybody who has identified issues, provided patches, and joined in this shared interest.

If you're new to this blog and want to help or be helped be sure to check out:

Twitter: (@SpatialPython)
Geospatail Python on Google Groups
PyShp Project Site
Geospatial Python Project Site

Tuesday, February 22, 2011

Clip a Raster using a Shapefile


Clipping a satellite image: Rasterize, Mask, Clip, Save

If you read this blog you see most of the material covers shapefiles. The intent of this blog is to cover remote sensing as well and this article provides a great foundation for remote sensing in Python. In this post I'll demonstrate how to use several Python libraries to to create a script which can take any polygon shapefile and use it as a mask to clip a geospatial image. Although I'm demonstrating a fairly basic process, this article and the accompanying sample script is densely-packed with lots of good information and tips that would take you hours if not days to piece together reading forum posts, mailing lists, blogs, and trial and error. This post will get you well on your way to doing whatever you want to do with Python and Remote Sensing.

Satellite and aerial images are usually collected in square tiles more or less the same way your digital camera frames and captures a picture. Geospatial images are data capturing different wavelengths of light reflected from known points on the Earth or even other planets. GIS professionals routinely clip these image tiles to a specific area of interest to provide context for vector layers within a GIS map. This technique may also be used for remote sensing to narrow down image processing to specific areas to reduce the amount of time it takes to analyze the image.

The Process

Clipping a raster is a series of simple button clicks in high-end geospatial software packages. In terms of computing, geospatial images are actually very large, multi-dimensional arrays. Remote Sensing at its simplest is performing mathematical operations on these arrays to extract information from the data. Behind the scenes here is what the software is doing (give or take a few steps):

Convert the vector shapefile to a matrix which can be used as mask
Load the geospatial image into a matrix
Throw out any image cells outside of the shapefile extent
Set all values outside the shapefile boundary to NODATA (null) values
OPTIONAL: Perform a histogram stretch on the image for better visualization
Save the resulting image as a new raster.

Geospatial Python Raster Clipping Workflow

Tools

Two things I try to do on this blog are build on techniques used in previous posts and focus on pure-Python solutions as much as possible. The script featured in this post will use one of the shapefile rasterization techniques I've written about in the past. However I did not go pure-Python on this for several reasons. Geospatial image formats tend to be extremely complex. You could make a career out of reading and writing the dozens of evolving image formats out there. As the old saying goes TIFF stands for "Thousands of Incompatible File Formats". So for this reason I use the Python bindings for GDAL when dealing with geospatial raster data. The other issue is the size of most geospatial raster data. Satellite and high-resolution aerial images can easily be in the 10's to 100's of megabytes size range. Doing math on these images and the memory required to follow the six step process outlined above exceeds the capability of Python's native libraries in many instances. For this reason I use the Numpy library which is dedicated to large, multi-dimensional matrix math. Another reason to use Numpy is tight integration with GDAL in the form of the "GDALNumeric" module. (Numeric was the predecessor to Numpy) In past posts I showed a pure-Python way to rasterize a shapefile. However I use the Python Imaging Library (PIL) in this example because it provides convenient methods to move data back and forth between Numpy.

Library Installation

So in summary you will need to install the following packages to make the sample script work. Usually the Python Disutils system (i.e. the "easy_install" script) is the fastest and simplest way to install a Python library. Because of the complexity and dependencies of some of these tools you may need to track down a pre-compiled binary for your platform. Both Numpy and GDAL have them linked from their respective websites or the Python Package Index.

The Example

# RasterClipper.py - clip a geospatial image using a shapefile

import operator
from osgeo import gdal, gdalnumeric, ogr, osr
import Image, ImageDraw

# Raster image to clip
raster = "SatImage.tif"

# Polygon shapefile used to clip
shp = "county"

# Name of clip raster file(s)
output = "clip"

# This function will convert the rasterized clipper shapefile 
# to a mask for use within GDAL.    
def imageToArray(i):
    """
    Converts a Python Imaging Library array to a 
    gdalnumeric image.
    """
    a=gdalnumeric.fromstring(i.tostring(),'b')
    a.shape=i.im.size[1], i.im.size[0]
    return a

def arrayToImage(a):
    """
    Converts a gdalnumeric array to a 
    Python Imaging Library Image.
    """
    i=Image.fromstring('L',(a.shape[1],a.shape[0]),
            (a.astype('b')).tostring())
    return i
     
def world2Pixel(geoMatrix, x, y):
  """
  Uses a gdal geomatrix (gdal.GetGeoTransform()) to calculate
  the pixel location of a geospatial coordinate 
  """
  ulX = geoMatrix[0]
  ulY = geoMatrix[3]
  xDist = geoMatrix[1]
  yDist = geoMatrix[5]
  rtnX = geoMatrix[2]
  rtnY = geoMatrix[4]
  pixel = int((x - ulX) / xDist)
  line = int((ulY - y) / yDist)
  return (pixel, line) 

def histogram(a, bins=range(0,256)):
  """
  Histogram function for multi-dimensional array.
  a = array
  bins = range of numbers to match 
  """
  fa = a.flat
  n = gdalnumeric.searchsorted(gdalnumeric.sort(fa), bins)
  n = gdalnumeric.concatenate([n, [len(fa)]])
  hist = n[1:]-n[:-1] 
  return hist

def stretch(a):
  """
  Performs a histogram stretch on a gdalnumeric array image.
  """
  hist = histogram(a)
  im = arrayToImage(a)   
  lut = []
  for b in range(0, len(hist), 256):
    # step size
    step = reduce(operator.add, hist[b:b+256]) / 255
    # create equalization lookup table
    n = 0
    for i in range(256):
      lut.append(n / step)
      n = n + hist[i+b]
  im = im.point(lut)
  return imageToArray(im)

# Load the source data as a gdalnumeric array
srcArray = gdalnumeric.LoadFile(raster)

# Also load as a gdal image to get geotransform 
# (world file) info
srcImage = gdal.Open(raster)
geoTrans = srcImage.GetGeoTransform()

# Create an OGR layer from a boundary shapefile
shapef = ogr.Open("%s.shp" % shp)
lyr = shapef.GetLayer(shp)
poly = lyr.GetNextFeature()

# Convert the layer extent to image pixel coordinates
minX, maxX, minY, maxY = lyr.GetExtent()
ulX, ulY = world2Pixel(geoTrans, minX, maxY)
lrX, lrY = world2Pixel(geoTrans, maxX, minY)

# Calculate the pixel size of the new image
pxWidth = int(lrX - ulX)
pxHeight = int(lrY - ulY)

clip = srcArray[:, ulY:lrY, ulX:lrX]

# Create a new geomatrix for the image
geoTrans = list(geoTrans)
geoTrans[0] = minX
geoTrans[3] = maxY

# Map points to pixels for drawing the 
# boundary on a blank 8-bit, 
# black and white, mask image.
points = []
pixels = []
geom = poly.GetGeometryRef()
pts = geom.GetGeometryRef(0)
for p in range(pts.GetPointCount()):
  points.append((pts.GetX(p), pts.GetY(p)))
for p in points:
  pixels.append(world2Pixel(geoTrans, p[0], p[1]))
rasterPoly = Image.new("L", (pxWidth, pxHeight), 1)
rasterize = ImageDraw.Draw(rasterPoly)
rasterize.polygon(pixels, 0)
mask = imageToArray(rasterPoly)   

# Clip the image using the mask
clip = gdalnumeric.choose(mask, \
    (clip, 0)).astype(gdalnumeric.uint8)

# This image has 3 bands so we stretch each one to make them
# visually brighter
for i in range(3):
  clip[i,:,:] = stretch(clip[i,:,:])

# Save ndvi as tiff
gdalnumeric.SaveArray(clip, "%s.tif" % output, \
    format="GTiff", prototype=raster)

# Save ndvi as an 8-bit jpeg for an easy, quick preview
clip = clip.astype(gdalnumeric.uint8)
gdalnumeric.SaveArray(clip, "%s.jpg" % output, format="JPEG")

Tips and Further Reading

The utility functions at the beginning of this script are useful whenever you are working with remotely sensed data in Python using GDAL, PIL, and Numpy.

If you're in a hurry be sure to look at the GDAL utility programs. This collection has a tool for just about any simple operation including clipping a raster to a rectangle. Technically you could accomplish the above polygon clip using only GDAL utilities but for complex operations like this Python is much easier.

The data referenced in the above script are a shapefile and a 7-4-1 Path 22, Row 39 Landsat image from 2006. You can download the data and the above sample script from the GeospatialPython Google Code project here.

I would normally use the Python Shapefile Library to grab the polygon shape instead of OGR but because I used GDAL, OGR is already there. So why bother with another library?

If you are going to get serious about Remote Sensing and Python you should check out OpenEV. This package is a complete remote sensing platform including an ERDAS Imagine-style viewer. It comes with all the GDAL tools, mapserver and tools, and a ready-to-run Python environment.

I've written about it before but Spectral Python is worth a look and worth mentioning again. I also recently found PyResample on Google Code but I haven't tried it yet.

Beyond the above you will find bits and pieces of Python remote sensing code scattered around the web. Good places to look are:

More to come!

UPDATE (May 4, 2011): I usually provide a link to example source code and data for instructional posts. I set up the download for this one but forgot to post it. This zip file contains everything you need to perform the example above except the installation of GDAL, Numpy, and PIL:
http://geospatialpython.googlecode.com/files/clipraster.zip

Make sure the required libraries are installed and working before you attempt this example. As I mention above the OpenEV package has a Python environment with all required packages except PIL. It may take a little work to get PIL into this unofficial Python environment but in my experience it's less work than wrangling GDAL into place.

Thursday, February 10, 2011

Merging Lots of Shapefiles (quickly)

Arne, over at GIS-Programming.com, recently posted about merging shapefiles using a batch process. I can't remember the last time I merged two or more shapefiles but after googling around it is a very common use case. GIS forums are littered with requests for the best way to batch merge a directory full of files. My best guess is people have to work with automatically-generated, geographically disperse data with a common projection and database schema. I imagine these files would be the result of some automated sensor output. If you know some use cases requiring merging many shapefiles I'd be curious to hear about it.

Arne pointed out that all the code samples out there iterate through each feature in a shapefile and add them to the merged file. He says this method is slow. I agree to an extent (no pun intended). However, at some point the underlying shapefile library MUST iterate through each feature in order to generate the summary information, namely the bounding box, required to write a valid shapefile header. But it is theoretically slightly more efficient to wait until the merge is finished so there is only one iteration cycle. At the very least, waiting till the end requires less code.

The following example merges all the shapefiles in the current directory into one file and it is quite fast.

# Merge a bunch of shapefiles with attributes quickly!
import glob
import shapefile
files = glob.glob("*.shp")
w = shapefile.Writer()
for f in files:
  r = shapefile.Reader(f)
  w._shapes.extend(r.shapes())
  w.records.extend(r.records())
w.fields = list(r.fields)
w.save("merged")

Wednesday, February 2, 2011

Python 3 Version of the Python Shapefile Library Released

I created a "hasty" port of the Python Shapefile Library to Python 3 at the request of a developer. You can download it from the Subversion repository at the "pyshp" project on Google Code. You will now find the subversion trunk contains "Python 2" and "Python 3" folders. The documentation available in the "Downloads" section of the project site contains two sets of documentation now as well. The Python 3 version is tagged with "Py3".

The versions function identically however the "Editor" class is broken in the Python 3 version and this issue is documented in the instructions and the code. This class is not necessary to read and write shapefiles. It is just a convenience class and will be fixed in a future version. I have only made the Py 3 version pass the doctests - nothing more. So bug reports are welcome.

I also plan to make the Python 2 version compatible with Jython. Initial testing shows that it too works except for the Editor class. I'd appreciate any feedback if you are using it on that platform. The same applies to IronPython and even PythonCE which I haven't tested at all.

Because of the reorganization of the source tree many of the links on previous posts may be broken. I should have these repaired shortly. [UPDATE: All links have been corrected 2/11/11]

Both versions will be updated and maintained indefinitely as it will probably take several years for Python 3 to become mainstream - especially in the geospatial community. Many, many geospatial libraries will require porting to 3. I used the "2to3" script but found that most of the work was in casting strings and byte arrays which is no longer implicit. The library packs and unpacks data constantly so this change had a huge impact on the shapefile library. Python 2 made it easy to be lazy with data types.

Sunday, November 28, 2010

Introducing the Python Shapefile Library

Over Thanksgiving I finally got around to releasing the Python Shapefile Library. It is a single file of pure Python with no dependencies. It reads and writes shp, shx, and dbf for all 15 types of shapefiles in a pythonic way. You can find it with documentation here in the CheeseShop or search for "pyshp" on Google Code.

This library simply reads and writes shapefiles with no support for geometry calculations or the other eight or nine other supporting and undocumented shapefile formats including indexes and projection files which have been added since the specification was published in 1998.

Here's a basic example of writing a polygon shapefile:

import shapefile
w = shapefile.Writer(shapefile.POLYGON)
w.poly(parts=[[[1,5],[5,5],[5,1],[3,3],[1,1]]])
w.field('FIRST_FLD','C','40')
w.field('SECOND_FLD','C','40')
w.record('First','Polygon')
w.save('shapefiles/test/polygon')

There are plenty of other examples in the documentation.

The library consists of a Reader class, a Writer class, and an Editor class which simplifies making changes to an existing shapefile by giving you one object to work with so you don't have to juggle the Reader and the Writer objects yourself.

Beyond the docstring tests and some unit tests I tried PSL out in Jython with no issues. It's been awhile since I've run the tests. I want to try out Jython again as well as the other Python implementations which have a "struct" and some form of "os" module. I don't expect any issues with IronPython.

My company sells industrial-strength, native shapefile libraries for Java and Visual Basic which I was not involved in developing. I wrote this simple library to fully learn the shapefile specification for my own curiosity and to lead to some improvements in our commercial libraries. I learned quite a bit and we plan to release some very interesting features to our JShapefile and VBShapefile libraries in 2011 which will solve some major annoyances faced by developers who work with the shapefile format on a regular basis. More on that later...

PSL is not the only way to write shapefiles with Python however as far as I know it is the only complete pure Python library. Every other option is a Python wrapper around a C or C++ library (not that there's anything wrong with that) or partially-developed in Python only. I like having a pure Python, dependency-free, no-setup choice even if it's much slower than a highly-optimized, C-based module. Here's why:

C-based modules can't follow your code everywhere - at least not easily (ex. Google App Engine and other web hosts, many embedded platforms, Python on different runtimes such as Jython and IronPython)
Unless the developer really goes out of his or her way, C-based geospatial libraries wrapped in Python have kludgy-feeling methods and return opaque objects. There are notable exceptions to this rule but they are few and far between.
Speed is the #1 reason developers cite as a reason to create C-based Python modules. In the geospatial domain the complexity of the data formats and spatial calculations makes wrapping libraries the easier choice. But most developers use Python because of the speed of development and ease of maintenance rather than program execution. In the rapidly-growing geospatial technology world new ideas are coming out every day. Rapid application development is key. The more easy-to-use, easy-to-change libraries the better.

Here are some other Python shapefile tools.

ShpUtils - Zack Johnson's pure-Python shapefile reader.

Shapelib - The original C-based shapefile library with Python bindings.

Pyshape - an alternative shapelib wrapper

OGR - General vector read/write library from shapelib creator Frank Warmerdam

Shapefile - a pure-Python read/write module under development

Tuesday, February 17, 2009

Zippity Doo Dah

Back in mid 2003 NVision had just a handful of geospatial programmers and we were relatively unknown outside of the NASA Stennis Space Center incubator which housed our tiny office. So I was surprised when a stranger who had googled us called asking about a adding a zipcode store finder to his website. "Simple enough," I thought.

I quickly did the calculations in my head for labor and data preparation of an HTML iframe containing a form with an ArcIMS map. I put the stranger on hold and ran my estimate by my boss. It sounded reasonable to him. So I took the stranger off hold and said, "Yes sir, we can do that for about $2,000." He repeated the figure back to me in disgust, "Two-THOUsand dollars? Are you kidding me?" He hung up. I was caught off guard. Didn't this guy realize what he was asking for? Geocoding stores, running those points against polygons, returning a distance to the would-be customer - and all on the Internet of all places. This was before GoogleMaps mind you.

Of course the point of all this is to show how far we've come in 6 years. Now zipcode calculators are a nifty hack people blog about using their cell phone on the way to the airport. And there are plenty for Python.

Here's several Python zipcode calculators in order of complexity.

This one uses a database built in MySQL:;
http://jehiah.cz/archive/spatial-proximity-searching-using-latlongs

This one provides a python interface to geonames.org:
http://www.zindep.com/news/interface-python-pour-geonames/?searchterm=geonames

Here's one using SQLObject to store the data
http://www.zachary.com/s/blog/2005/01/12/python_zipcode_geo-programming

And just to show how common this concept has become - it's now a casual entry in the Python Cookbook alongside other bits of trivial python code:
http://code.activestate.com/recipes/393241/

Granted some of these scripts are distance calculators but I have to concede the stranger who called me in 2003 was absolutely right but maybe a little ahead of his time. It was really more of a $200 problem than $2000.

Thursday, February 12, 2009

Mapnik - Maybe the best Python Mapping Platform Yet

The vast majority of geospatial libraries are written in either C or C++ for two reasons: 1) Speed and 2) Many of these libraries have began development when C/C++ were the languages du jour.

Over the years Python bindings have appeared for many of these libraries to make them more easier to use. These bindings are extremely helpful as so much of geospatial work involves one-off data conversions or quick map production. Over the last few years several Python libraries have emerged developed from the ground up for Python and these are downright fun!

One of the most notable examples is Mapnik started by Artem Pavlenko. Mapnik caught my eye when it first started because it promised to have a great Python API. Mapnik also had a sharp focus on attractive maps using anti-aliasing and some relatively new graphics libraries. When I checked in on Mapnik recently I was really impressed.

View Larger Map

Mapnik fills all the checkboxes for a great python tool and it's a great geospatial library as well. It has pre-compiled binaries for python 2.5 so there's no need to go back and forth to the mailing list just to get it to compile before you see if you like it or not. Most opensource libraries have troubles keeping documentation current with the rapid pace of development. Mapnik addresses this problem by heavy use of example/test scripts and a busy wiki with lots of examples from users and developers.

Another interesting feature of Mapnik is its straightforward xml dialiect for sty

ling maps allowing you to cleanly separate map appearance from programming logic. And not to forget one of the core values of the library the rendering is truly great. There are several output options including svg and pdf using either the Agg or Cairo graphics libraries. The developers openly compare the rendering quality to GoogleMaps and rightfully so. Mapnik goes beyond antialiasing and rouned line joints to address even more interesting rendering challenges. One striking example of the rendering quality is the new "BuildingSymbolizer" which creates a nifty "pseudo 3D building effect" on polygons.

Mapnik was designed from the start for both web and desktop use. OpenStreetMap uses the library to render it's map. Go to OpenStreetMap.org and zoom into London to see it in action. From its trendy Django website and Trac Wiki to its slick rendering, xml map descriptors, and ready-to-run pythonic API Mapnik is a modern geospatial python library that will certainly add users to the geospatial python community.

Thursday, July 3, 2008

The GeospatialPythonosphere

Google indexed and pageranked the Internet. Then Google bought the obscure program "Keyhole 2 Lt" and branded it "GoogleEarth". Next Google hired Plone developer Alexander Limi. And finally they hired Python's creator Guido von Rossum.

Of course other significant things happened too like ESRI making Python an official scripting language for their flaghship product, but Google caused the explosion of interest in geographic information technology and Python.

Despite this explosion if you are interested in the combination of Python and GIS/Remote Sensing there is surprisingly little discussion on this topic given the independent popularity of these two technologies.

One reason is the the combination of Python and GIS is still new and niche despite Python's success and the growth of GIS. Another reason is the few people out there deeply interested in the advancement of Python as a GIS technology are too busy writing the few pieces of software out there to spend a lot of time blogging and making presentations at mainstream conferences.

This fact struck me at the 2006 ESRI User Conference in San Diego. There was a lot of buzz and curiosity about ESRI adopting Python in ArcGIS. At the conference bookstore I eavesdropped on a lady complaining to the cashier that there were no books on using Python in ArcGIS. There were hundreds of ESRI Press books, case studies, and well-known GIS textbooks but the only Python related material was a handful of O'Reilly Python books.

The number of professionally published books about a technology are a good indicator of mainstream interest. And to my disappointment geospatial python hasn't even reached the bar needed to publish O'Reilly's "Perl for Bioinformatics" series. Tyler Mitchell's excellent "Web Mapping Illustrated" came close but I want to see Python in the title. But you'd think there'd be more GP blogs out there given the dead-simple medium of blogging.

This blog intends to tip that scale of general disinterest ever so slightly. Python provides the lightweight fits-your-brain agility needed to keep up with the booming geographic technology industry while GIS provides an endless realm of killer applications waiting to be developed.

If you're starving for geospatial python information diet this post provides the available rations out there for a steady diet.

The Blogs

One of the most prolific GP writers and developers out there is Sean Gilles who posts frequently to his blog "import cartography". Sean has been a leader in the open source GIS arena and happens to have an interest in Python. His contributions to GP are too numerous to list but include Python Mapscript, Cartographic Objects for Zope, the Python Cartographic Library, PrimaGIS, Shapely, Pleiades, and more all linked at gispython.org.

Howard Butler with his consulting blog hobu.biz is another must-have RSS feed. Hobu has been blogging about Python and GIS topics since the turn of the century on his original blog "Fanning Mill". In addition to his many open source GIS contributions Howard's GP contributions are also industry staples including Python bindings for software and libraries including ArcView 3.x, GDAL, shapelib, Proj4, and ArcSDE to name a few. It is rumored Howard's Python bindings to the old ArcView 3.x dlls which were published in ESRI's ArcUser magazine are one of the factors that led to the use of Python in ArcGIS.

Chris Tweedie publishes a steady stream of open source GIS topics on his blog "Chris GISmo's". Many of his posts are Python related but all of them are worth reading.

The fourth geospatial python news source is simply gis+python on del.icio.us as an rss feed. It will turn up a lot of links to interesting software and posts you wouldn't find otherwise.

The Feeds

And here are the feeds:

import cartography: http://zcologia.com/news/
hobu.biz: http://hobu.biz/feed/
Chirs GISmo's: http://chris.narx.net/feed/
Del.icio.us: http://del.icio.us/rss/tag/gis+python

Pages