Extensible Visualization

SpatialHadoop provides efficient tools for visualization of large files. This functionality is greatly useful when you have a very large file and you want to explore it by creating an image out of it. For example, the image shown below visualizes all road networks in the globe. This image is created from a dataset of 59 Million lines with a total size of 20.6GB. The dataset is extracted from OpenStreetMap and can be download for free in the datasets page.

SpatialHadoop contains an extensible interface for visualization which separates the visualization logic from the implementation of the visualization algorithm. The user can easily define a new type of visualization by extending the abstract class Rasterize. The implementing class can be plugged into the visualization algorithms provided in SpatialHadoop to work as a MapReduce program to generate either a single-level image or multilevel image. In this tutorial we first describe how the abstract interface looks like and how to use it to generate both types of images.

Types of Images

SpatialHadoop supports two types of images, namely, single level and multilevel images. A single level image is an image of a fixed resolution that can be viewed using any image viewer or embedded in a document such as a website or a report. The quality of the image is limited by the resolution of the image. The image shown above is an example of a single level image. A multilevel image is composed of many small image tiles generated for different regions at different zoom levels. This allows the user to zoom into the image or pan around to see more details about a specific area. This technique is already used in most web-based maps such as Google Maps (Satellite view), Bing Maps and OpenStreetMap. An example is shown below for a dataset of road segment in Minnesota extracted from OpenStreetMap data.

Both image types are supported by SpatialHadoop through a common interface. We define a general visualization interface that can be implemented once and used to generate both single and multilevel images.

Visualization Interface

The visualization interface is defined in the abstract class Rasterizer. It contains five main methods that define the visualization logic.

smooth: This optional function can be defined to fuse nearby together to improve the generate image. For example, when visualizing a road network, this method can be used to merge intersecting road segments.
createRaster: This method initializes a raster layer of a given size. This raster layer acts as a canvas on which records will be drawn. For example, it can be an in-memory image on which records are drawn, or a two-dimensional histogram on which data are aggregated
rasterize: This method takes a raster layer previously created using createRaster and a shape, then it rasterizes (i.e., draws) this shape on the raster layer.
merge: This method takes two raster layers and merges them together into one raster layer. This is used to merge partial images into one final image before it is written to the output is a single image.
writeImage: This method is called once at the end to write the final image to the output in a standard image format.

How to Use the `Rasterizer` Interface

Let us say you have a new type of data that you want to visualize in a customized way. First, you need to create a new rasterizer as a class that implements the Rasterizer interface. Once you implement this class, you need to use either the SingleLevelPlot or MultilevelPlot classes to generate a single level or multilevel images, respectively. Both of them contain a method that accepts a class that extends the Rasterizer and use it to visualize an input data using MapReduce.

Case Study I: Geometric Plot

The geometric plot operation implements a simple rasterizer that draws the geometry of shapes on a normal image. For example, it generates a scatter plot out of a point dataset or draws a set of polygons on an image. The rasterizer of this operation is implemented as follows:

smooth: No smooth function is implemented for this operation
createRaster: Initializes an in-memory image of the given resolution with a transparent background
rasterize: Draws the geometry of a shape on the in-memory image. For example, a point is represented as a pixel while a polygon is drawn using the Graphics#drawPolygon method.
merge: Plots one image on top of the other image. The transparent background initialized for each image allows the image on top to reveal the image beneath it.
write: The image is written to the output in a standard PNG format using the ImageIO class.

Case Study II: Heat Map Plot

This visualization technique is applied to an input file that contains points. It gives a color to each pixel in the generated image according to the density of points around this pixel. An example of the heat map of tweets in one day is shown below (click the image to enlarge).

In this image, areas with low density of tweets are colored in blue while areas of higher densities are colored in red. The functions for this operation are implemented as follows:

smooth: No smooth function is implemented for this method
createRaster: Initializes a frequency map as a two-dimensional array of integers where each entry corresponds to a pixel in the image and holds the total number of points around it
rasterize: Takes a point and updates the frequency map by incrementing all points around its location.
merge: Merges two frequency map by adding up corresponding entries in both frequency maps.
write: This method first converts the frequency map into an image by mapping each entry to a pixel and color it with the corresponding color according to its value. After that, it writes the resulted image as a PNG image using the ImageIO class.