A MapReduce framework for spatial data
SpatialHadoop provides efficient tools for visualization of large files. This functionality is greatly useful when you have a very large file and you want to explore it by creating an image out of it. For example, the image shown below visualizes all road networks in the globe. This image is created from a dataset of 59 Million lines with a total size of 20.6GB. The dataset is extracted from OpenStreetMap and can be download for free in the datasets page.
SpatialHadoop contains an extensible interface for visualization which separates the visualization logic from the implementation of the visualization algorithm. The user can easily define a new type of visualization by extending the abstract class Rasterize. The implementing class can be plugged into the visualization algorithms provided in SpatialHadoop to work as a MapReduce program to generate either a single-level image or multilevel image. In this tutorial we first describe how the abstract interface looks like and how to use it to generate both types of images.
SpatialHadoop supports two types of images, namely, single level and multilevel images. A single level image is an image of a fixed resolution that can be viewed using any image viewer or embedded in a document such as a website or a report. The quality of the image is limited by the resolution of the image. The image shown above is an example of a single level image. A multilevel image is composed of many small image tiles generated for different regions at different zoom levels. This allows the user to zoom into the image or pan around to see more details about a specific area. This technique is already used in most web-based maps such as Google Maps (Satellite view), Bing Maps and OpenStreetMap. An example is shown below for a dataset of road segment in Minnesota extracted from OpenStreetMap data.
Both image types are supported by SpatialHadoop through a common interface. We define a general visualization interface that can be implemented once and used to generate both single and multilevel images.
The visualization interface is defined in the abstract class Rasterizer. It contains five main methods that define the visualization logic.
Let us say you have a new type of data that you want to visualize in a customized way. First, you need to create a new rasterizer as a class that implements the Rasterizer interface. Once you implement this class, you need to use either the SingleLevelPlot or MultilevelPlot classes to generate a single level or multilevel images, respectively. Both of them contain a method that accepts a class that extends the Rasterizer and use it to visualize an input data using MapReduce.
The geometric plot operation implements a simple rasterizer that draws the geometry of shapes on a normal image. For example, it generates a scatter plot out of a point dataset or draws a set of polygons on an image. The rasterizer of this operation is implemented as follows:
This visualization technique is applied to an input file that contains points. It gives a color to each pixel in the generated image according to the density of points around this pixel. An example of the heat map of tweets in one day is shown below (click the image to enlarge).