Pages

Showing posts with label geometry. Show all posts
Showing posts with label geometry. Show all posts

Monday, May 20, 2013

New __geo_interface__ for PyShp



Christian Ledermann took the initiative to fork pyshp and add the __geo_interface__ convention.
http://twitter.com/GeoJSON


The __geo_interface__ is a community standard riding the current "less is more" entropy wave to get away from heavy data exchange standards, make software compatible, and get some work done.

This standard is very pythonic and well thought out which is no surprise because Sean Gillies and Howard Butler are a driving forces behind it.  The goal is to make moving data around among libraries with different specialties, like Shapely and PySAL, easier.  It is closely tied to GeoJSON which is getting a lot of traction and shaking up the industry and community.

Christian's  __geo_interface__ implementation for PyShp is here:

https://github.com/cleder/pyshp

He also wrote some ogr2ogr-style conversion samples to show you how to use it here:
https://github.com/cleder/geo_file_conv

I'm 100% behind these ideas and will roll this interface into the main trunk.  But there's nothing stopping you from using Christian's fork today.

Enjoy!

Tuesday, May 29, 2012

SBN Mystery - Solved!

Last October I asked you all for help in figuring out the spatial indexing algorithm used to create Esri sbn files.  Using Pyshp, I had successfully decoded the file formats which I provided. However I could not figure out the algorithm used to create and populate the spatial bins within these files.

Today I'm pleased to announce this challenge has been answered.  The GIS community now has access to both the sbn and sbx file format as well as the algorithm for grouping features in a shapefile into "spatial bins".

I'm glad I asked for help as this challenge turned out to be quite difficult. The brain behind this operation is Marc Pfister with some good insights from Si Parker.  Marc worked tirelessly on this problem for months with a cross-country move and complete career change thrown in to make it interesting.  Marc did all the heavy intellectual lifting with me playing an inquisitive, but usually short-sighted Watson to his Holmes.  I generated endless series of shapefiles and one-off scripts to help Marc try and flush out the algorithm based only on subtle changes in the number of bins and features they contained as well as his past experience with spatial indexing.

When I figured out the file formats I had hoped I was just a Wikipedia search away from recognizing the spatial tree algorithm.  But the solution turned out to be much more complex than that.  Esri uses a sort of balanced tree that exhibits traits of several different algorithms.  The system seems carefully designed but is by no means obvious.  I will publish Marc's findings as soon as I can.

There are still a few shapefile cases which create puzzling but insignificant results. However we are at the 98% mark. The project goal of compatibility has been reached.  There is no longer any reason to hold off on sharing the results.  We are fairly certain that we are able to create sbn and sbx files which sufficiently fool ArcMap as well as other Esri packages so other software can read, use, and generate these indexes alongside the Esri suite.  There is more testing to do but it seems we are out of the woods.

What we haven't done is nicely packaged all of this work up.  But Marc posted a small set of Python scripts on github which demonstrate the algorithm and file handling needed to copy this capability.  Over the coming months I will fold this code into Pyshp, produce better documentation on the algorithm, and provide posts on how to deal with these indexes. But for now here's what you've been waiting for:

https://github.com/drwelby/hasbeen

By the way, Marc does freelance programming.  In my job, I get the opportunity to work with lots of really bright geospatial programmers and mathematicians and this guy is one of the very best I've ever seen.  If you have a tough geospatial project and need some E=MC2 smarts definitely look him up.

Monday, May 21, 2012

Advanced Shapefile Merger

Italian GIS blogger Toni sent me a message about a sophisticated OGR-based shapefile merger utility he created.  Last year I posted a simple pyshp example that would find all ".shp" files in a directory and merge the geometry and attributes into a single shapefile.  Toni's version takes this concept much further to include wildcards, recursive directory walking, exclusion lists, and some dbf tricks.  You can find this utility at Toni's blog "Furious GIS":

http://furiousgis.blogspot.it/2012/05/python-shapefile-merger-utility.html

Tuesday, February 28, 2012

Pyshp shapeRecords() Method

The shapefile.Reader.shapeRecords()
method lets you juggle both the
geometry and dbf attributes at the
same time.
The shapefile.Reader.shapeRecords() method is a simple convenience method which allows you to simultaneously loop through both the geometry and records of a shapefile.  Normally  you would loop through the shape records and then loop through the dbf records seperately.  But sometimes it's easier to have both sides of the shapefile equation accessible at the same time.  This ability is important sometimes because the link between geometry and attributes is implied by their order in the file and not explicit which can make referencing one side or the other a pain.  Warning: the current implementation pulls everything into memory at once which can be a problem for very large shapefiles. This weakness will be updated in future versions.

Here’s a simple usage example followed by a detailed explanation and a few other posts where I use this method without much explanation.

Let’s say you have a simple point-location address shapefile named “addr.shp” with the following structure:

GEOMETRY ADDRESS CITY STATE ZIP
[-89.522996, 34.363596] 7018 South 8th Oxford MS 38655
[-89.520695, 34.360863] 1199 South 11th Oxford MS 38655
[-89.520927, 34.362924] 8005 Fillmore Ave Oxford MS 38655

You could then use the shapeRecords method like this:

>>> import shapefile
>>> r = shapefile.Reader(“addr”)
>>> sr = r.shapeRecords()
>>> # get the first shaperecord
>>> sr_test = sr[0]
>>> # Look at the geometry of the shape
>>> sr_test.shape.points
[[-89.522996, 34.363596]]
>>> # Look at the attributes of the dbf record
>>> sr_test.record
[‘7018 South 8’,’Oxford’,’MS’,’38655’]
>>> # Now let’s iterate through all of them
>>> for sr in r.shapeRecords():
...    print “x: “, sr.points[0][0]
...    print “y: “, sr.points[0][1]
...    # Output just the address field
...    print “Address: “, sr.record[0]
x: -89.522996
y: 34.363596
Address: 7018 South 8th
x: -89.520695
y: 34.360863
Address: 1195 South 11th
x: -89.520927
y: 34.362924
Address: 805 Fillmore Ave

Here’s how it works.

The shapeRecords() method returns a list.

Each entry in that list is a _ShapeRecord object instance.

A _ShapeRecord object has two attributes: shape, record

_ShapeRecord.record contains a simple list of the attributes.

_ShapeRecord.shape contains a _Shape object instance.

A _Shape object has, at a minimum, two attributes: shapeType, points

If the _Shape instance contains a polygon a “parts” attribute will appear.  This attribute contains the index in the point list of the beginning of a “part”.  Parts let you store multiple shapes in a single record.

The shapeType attribute provides a number telling you if the shapefile is  a point, polygon, line, etc. file.  These constants are listed in the shapefile spec document as well as near the top of the source code.

The points is just a list containing lists of the point coordinates.  Two things to note:  If the geometry has multiple parts, such as multiple polygons, the points for all parts are just lumped together.  You must separate them by referencing the parts index list.  Some shape types allow for z and m values which may appear in addition to the x,y pairs.

This method is really just a clumsy convenience method that basically zips up the results of the shapes() and records() methods you are already using.

I have a few blog posts where I call this method as well:

http://geospatialpython.com/2011/02/changing-shapefiles-type.html

http://geospatialpython.com/2011/01/point-in-polygon.html

http://geospatialpython.com/2010/12/dot-density-maps-with-python-and-ogr.html  (in the comments)

Friday, November 4, 2011

Deleting Shapefile Features

Sometimes, usually as a server-based operation, you need to delete all of the features in a shapefile. All you want left is the shapefile type, the dbf schema, and maybe the overall bounding box. This shapefile stub can then be updated by other processes. Pyshp currently doesn't have an explicit "delete" method. But because pyshp converts everything to native Python data types (strings, lists, dicts, numbers) you can usually manipulate things fairly easily. The solution is very similar to merging shapefiles but instead you are writing back to the same file instead of a new copy. There's only one hitch in this operation resulting from a minor difference in the pyshp Reader and Writer objects. In the reader the "bbox" property returns a static array of [xmin, ymin, xmax, ymax]. The Writer also has a "bbox" property but it is a method that is called when you save the shapefile. The Writer calculates the bounding box on the fly by reading all of the shapes just before saving. But in this case there are no shapes so the method would throw an error. So what we do is just override that method with a lambda function to return whatever bbox we want whether it be the original bbox or a made up one.
import shapefile 
# Read the shapefile we want to clear out
r = shapefile.Reader("myshape") 
# Create a Writer of the same type to save out as a blank
w = shapefile.Writer(r.shapeType) 
# This line will give us the same dbf schema 
w.fields = r.fields 
# Use the original bounding box in the header 
w.bbox = lambda: r.bbox 
# Save the featureless, attribute-less shapefile
w.save("myshape") 
Instead of using the original bounding box we could just populate it with 0's and let the update process repopulate it:
w.bbox = lambda: [0.0, 0.0, 0.0, 0.0]
Note coordinates in a shapefile must always be floating-point numbers. Sometimes you may not want to delete all of the features. You may want to select certain features by attribute or using a spatial operation.

Tuesday, August 23, 2011

Point in Polygon 2: Walking the line

Credit: SpikedMath.com
This post is a follow-up to my original article on testing if a point is inside a polygon.  Reader Sebastian V. pointed out the ray-casting alogrithm I used does not test to see if the point is on the edge of the polygon or one of the verticies.  He was even nice enough to send a PHP script he found which uses an indentical ray-casting method and includes a vertex and edge test as well.

These two checks are relatively simple however whether they are necessary is up to you and how you apply this test.  There are some cases where a boundary point would not be considered for inclusion.  Either way now you have an option.  This function could even be modified to optionally check for boundary points.

# Improved point in polygon test which includes edge
# and vertex points

def point_in_poly(x,y,poly):

   # check if point is a vertex
   if (x,y) in poly: return "IN"

   # check if point is on a boundary
   for i in range(len(poly)):
      p1 = None
      p2 = None
      if i==0:
         p1 = poly[0]
         p2 = poly[1]
      else:
         p1 = poly[i-1]
         p2 = poly[i]
      if p1[1] == p2[1] and p1[1] == y and x > min(p1[0], p2[0]) and x < max(p1[0], p2[0]):
         return "IN"
      
   n = len(poly)
   inside = False

   p1x,p1y = poly[0]
   for i in range(n+1):
      p2x,p2y = poly[i % n]
      if y > min(p1y,p2y):
         if y <= max(p1y,p2y):
            if x <= max(p1x,p2x):
               if p1y != p2y:
                  xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
               if p1x == p2x or x <= xints:
                  inside = not inside
      p1x,p1y = p2x,p2y

   if inside: return "IN"
   else: return "OUT"

# Test a vertex for inclusion
poligono = [(-33.416032,-70.593016), (-33.415370,-70.589604),
(-33.417340,-70.589046), (-33.417949,-70.592351),
(-33.416032,-70.593016)]
lat= -33.416032
lon= -70.593016

print point_in_poly(lat, lon, poligono)

# test a boundary point for inclusion
poly2 = [(1,1), (5,1), (5,5), (1,5), (1,1)]
x = 3
y = 1
print point_in_poly(x, y, poly2)
You can download this script here.

Monday, March 21, 2011

Working with Elevation Values in Shapefiles

I've had several questions about working with elevation values in shapefiles.  Awhile back I  posted a brief example in the wiki on the google code project site and felt it should be included here so this information is more visible.


Elevation values, known as "Z" values allow you to create 3-dimensional shapefiles. Working with elevation values in the Python Shapefile Library is easy but requires a step which hasn't been well documented until now. Make sure you are working with at least version 1.0 (revision 24 in the repository) as previous versions have a bug that prevents elevation values from being written properly as well as another bug which occurs when creating certain polygons.

The shapefile specification allows for three types of 3-dimensional shapefiles:
PointZ - a 3D point shapefile, PolylineZ - a 3D line shapefile, and PolygonZ - a 3D polygon shapefile.

The most important thing to remember is to explicitly set the shape type of each record to the correct shape type or it will default to a 2-dimensional shape. Eventually the library will automatically detect the elevation type.
Here is a brief example of creating a PolygonZ shapefile:

import shapefile
w = shapefile.Writer(shapeType=shapefile.POLYGONZ)
# OR you can type
# w = shapefile.Writer(shapeType=15)
w.poly([[[-89.0, 33, 12], [-90, 31, 11], [-91, 30, 12]]], shapeType=15)
w.field("NAME")
w.record("PolyZTest")
w.save("MyPolyZ")

When you read 3D shapefiles the elevation values will be stored separately:

>>>import shapefile
>>>r = shapefile.Reader("MyPolyZ")
>>>r.shapes()[0].points
[[-89.0, 33.0], [-90.0, 31.0], [-91.0, 30.0]]
>>>r.shapes()[0].z
[12, 11, 12]

So remember: explicitly set the shapetype for each shape you add to ensure the z values are maintained. And know that the z values are stored in the "z" property of each shape instead of being integrated with the x,y value lists.

Wednesday, January 19, 2011

Point in Polygon

The Ray Casting Method tests if a point is inside a polygon.
UPDATE: There's a newer version of this algorithm that accounts for points that fall on the boundary of a polygon which are included as inside the polygon. The title is "Point in Polygon 2: Walking the line" and was published Aug. 23, 2011.

A fundamental geospatial operation is checking to see if a point is inside a polygon.  This one operation is the atomic building block of many, many different types of spatial queries.  This operation seems deceptively simple because it's so easy to see and comprehend visually. But doing this check computationally gets quite complex.

At first glance there are dozens of algorithms addressing this challenge.  However they all have special cases where they fail.  The failures come from the infinite number of ways polygons can form which ultimately foil any sort of systematic check.  For a programmer the choice comes down to a compromise between computational efficiency (i.e. speed in this case) and thoroughness (i.e. how rare the exceptions are).

The best solution to this issue I've found is the "Ray Casting Method".  The idea is you start drawing an imaginary line from the point in question and stop drawing it when the line leaves the polygon bounding box. Along the way you count the number of times you crossed the polygon's boundary.  If the count is an odd number the point must be inside.  If it's an even number the point is outside the polygon.  So in summary, odd=in, even=out - got it?

This algorithm is fast and is accurate.  In fact, pretty much the only way you can stump it is if the point is ridiculously close to the polygon boundary where a rounding error would merge the point with the boundary.  In that case you can just blame your programming language and switch to Python.

I had no intention of implementing this algorithm myself so I googled several options, tried them out, and found a winner.  It's interesting but not surprising that most of the spatial algorithms I find and use come from computer graphics sites, usually gaming sites or computer vision sites, as opposed to geospatial sites.  My favorite ray casting point-in-polygon sample came from the "Simple Machine Forum" at "PSE Entertainment Corp".  It was posted by their anonymous webmaster.

# Determine if a point is inside a given polygon or not
# Polygon is a list of (x,y) pairs. This function
# returns True or False.  The algorithm is called
# the "Ray Casting Method".

def point_in_poly(x,y,poly):

    n = len(poly)
    inside = False

    p1x,p1y = poly[0]
    for i in range(n+1):
        p2x,p2y = poly[i % n]
        if y > min(p1y,p2y):
            if y <= max(p1y,p2y):
                if x <= max(p1x,p2x):
                    if p1y != p2y:
                        xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
                    if p1x == p2x or x <= xints:
                        inside = not inside
        p1x,p1y = p2x,p2y

    return inside

## Test

polygon = [(0,10),(10,10),(10,0),(0,0)]

point_x = 5
point_y = 5

## Call the function with the points and the polygon
print point_in_poly(point_x,point_y,polygon)

Easy to read, easy to use.  In a previous post on creating a dot density profile, I used the "contains" method in OGR to check randomly-generated points representing population counts against US Census Bureau tracts.  That script created a point shapefile which could then be added as a layer.  It worked great but it wasn't pure python because of OGR.  The other problem with that recipe is creating a shapefile is overkill as dot density maps are just a visualization.

I decided to build on some other posts to combine this ray casting method, PNGCanvas, and the Python Shapefile Library to create a lightweight, pure Python dot density map implementation. The following code reads in a shapefile of census tracts, looks at the population value for that tract, then randomly draws a dot within that census tract for every 50 people.  The census tract boundaries are also added to the resulting PNG image.  The conventional wisdom, especially in the geospatial world, states if you need to do a large number of costly calculations it's worth using C because Python will be much slower.  To my surprise the pure Python version was just about as quick as the OGR version.  I figured the point-in-polygon calculation would be the most costly part.  The results are close enough to warrant further detailed profiling which I'll do at some point.  But regardless this operation is much, much quicker in pure Python than I expected.

import random
import shapefile
import pngcanvas

def pip(x,y,poly):
    n = len(poly)
    inside = False
    p1x,p1y = poly[0]
    for i in range(n+1):
        p2x,p2y = poly[i % n]
        if y > min(p1y,p2y):
            if y <= max(p1y,p2y):
                if x <= max(p1x,p2x):
                    if p1y != p2y:
                        xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
                    if p1x == p2x or x <= xints:
                        inside = not inside
        p1x,p1y = p2x,p2y
    return inside

# Source shapefile - can be any polygon
r = shapefile.Reader("GIS_CensusTract_poly.shp")

# pixel to coordinate info
xdist = r.bbox[2] - r.bbox[0]
ydist = r.bbox[3] - r.bbox[1]
iwidth = 600
iheight = 500
xratio = iwidth/xdist
yratio = iheight/ydist

c = pngcanvas.PNGCanvas(iwidth,iheight,color=[255,255,255,0xff])

# background color
c.filledRectangle(0,0,iwidth,iheight)

# Pen color
c.color = [139,137,137,0xff]

# Draw the polygons 
for shape in r.shapes():
  pixels = []
  for x,y in shape.points:  
    px = int(iwidth - ((r.bbox[2] - x) * xratio))
    py = int((r.bbox[3] - y) * yratio)
    pixels.append([px,py])
  c.polyline(pixels)

rnum = 0
trnum = len(r.shapeRecords())
for sr in r.shapeRecords():
  rnum += 1
  #print rnum, " of ", trnum
  density = sr.record[20]
  total = int(density / 50)
  count = 0
  minx, miny, maxx, maxy = sr.shape.bbox   
  while count < total:    
    x = random.uniform(minx,maxx)
    y = random.uniform(miny,maxy)    
    if pip(x,y,sr.shape.points):
      count += 1
      #print " ", count, " of ", total
      px = int(iwidth - ((r.bbox[2] - x) * xratio))
      py = int((r.bbox[3] - y) * yratio)
      c.point(px,py,color=[255,0,0,0xff])

f = file("density_pure.png", "wb")
f.write(c.dump())
f.close()

The shapefile used above can be found here.

You can download PNGCanvas here.

And the Python Shapefile Library is here.