Extensible Visualization

SpatialHadoop provides efficient tools for visualization of large files. This functionality is greatly useful when you have a very large file and you want to explore it by creating an image out of it. For example, the image shown below visualizes all road networks in the globe. This image is created from a dataset of 59 Million lines with a total size of 20.6GB. The dataset is extracted from OpenStreetMap and can be download for free in the datasets page.

SpatialHadoop contains an extensible interface for visualization which separates the visualization logic from the implementation of the visualization algorithm. The user can easily define a new type of visualization by extending the abstract class Rasterize. The implementing class can be plugged into the visualization algorithms provided in SpatialHadoop to work as a MapReduce program to generate either a single-level image or multilevel image. In this tutorial we first describe how the abstract interface looks like and how to use it to generate both types of images.

Types of Images

SpatialHadoop supports two types of images, namely, single level and multilevel images. A single level image is an image of a fixed resolution that can be viewed using any image viewer or embedded in a document such as a website or a report. The quality of the image is limited by the resolution of the image. The image shown above is an example of a single level image. A multilevel image is composed of many small image tiles generated for different regions at different zoom levels. This allows the user to zoom into the image or pan around to see more details about a specific area. This technique is already used in most web-based maps such as Google Maps (Satellite view), Bing Maps and OpenStreetMap. An example is shown below for a dataset of road segment in Minnesota extracted from OpenStreetMap data.

Both image types are supported by SpatialHadoop through a common interface. We define a general visualization interface that can be implemented once and used to generate both single and multilevel images.

Visualization Interface

The visualization interface is defined in the abstract class Rasterizer. It contains five main methods that define the visualization logic.

How to Use the Rasterizer Interface

Let us say you have a new type of data that you want to visualize in a customized way. First, you need to create a new rasterizer as a class that implements the Rasterizer interface. Once you implement this class, you need to use either the SingleLevelPlot or MultilevelPlot classes to generate a single level or multilevel images, respectively. Both of them contain a method that accepts a class that extends the Rasterizer and use it to visualize an input data using MapReduce.

Case Study I: Geometric Plot

The geometric plot operation implements a simple rasterizer that draws the geometry of shapes on a normal image. For example, it generates a scatter plot out of a point dataset or draws a set of polygons on an image. The rasterizer of this operation is implemented as follows:

Case Study II: Heat Map Plot

This visualization technique is applied to an input file that contains points. It gives a color to each pixel in the generated image according to the density of points around this pixel. An example of the heat map of tweets in one day is shown below (click the image to enlarge).

In this image, areas with low density of tweets are colored in blue while areas of higher densities are colored in red. The functions for this operation are implemented as follows: