• Pigeon

Easy way to analyze large scale spatial data

Pigeon is an extension to Pig that adds spatial data types and functions. The extension is compliant to OGC standards which makes it easy to learn and use.


Pigeon Overview

Pigeon is a set of user-defined functions that allows you to write Pig Latin scripts which deal with spatial data. The extension is unobtrusive and is pure UDFs which makes it compatible with any version of Pig you use. Besides, it easily meshes with existing built-in functions in Pig such as FILTER, GROUP and JOIN.

Installing Pigeon

Pigeon is made available as a JAR file that should be called from you Pig script. Pigeon uses ESRI-geometry-API to create and process spatial data types. You should also include the JAR file of ESRI-geometry-API library. Once these two JARs are included in your script, you can directly call all the functions available in Pigeon. As a shortcut, the installation of Pigeon includes a file that provides short names for all functions to make it similar to that of PostGIS.

Prerequisites

In order to use Pigeon, you should have Pig installed and configured in your system. Check this tutorial

Download

Download the latest version of Pigeon here.

Usage examples

Once you have Pigeon configured correctly, you are ready to run some sample scripts. Below is a few examples that you can use as a start.

Load ZIP codes and calculate the area of each one

REGISTER pigeon-0.1.jar
REGISTER esri-geometry-api-1.1.1.jar;

zips = LOAD 'zcta510.csv.bz2' USING PigStorage(',');
zips_areas = FOREACH zips GENERATE $6 AS ZIP, ST_Area($0) AS area;
STORE zips_areas INTO 'zips_areas';

Downloads

pigeon-0.1.jar
pigeon_import.pig

Pigeon on github

The source code of Pigeon is available on Github.

Check Pigeon on github»