Setting up SpatialHadoop on Amazon EC2 (Works only for Hadoop 1.x)

This tutorial describes how to set up a cluster on Amazon EC2 that runs SpatialHadoop. The process is very similar to install Hadoop with an extra step that install SpatialHadoop.

Using SpatialHadoop with Amazon Elastic MapReduce (EMR) (Works for both Hadoop 1.x and 2.x)

Amazon provides an alternative way to running MapReduce job through the Elastic MapReduce (EMR) service. The service takes the burden of configuring and starting the Hadoop cluster using a simple web console or through a command line interface. SpatialHadoop can run on EMR clusters by providing a bootstrap action that installs SpatialHadoop as the cluster is starting.

In this tutorial, we will show how to install SpatialHaodop using the web console but the same technique can be used in the command line interface.

  1. Start the "New Cluster" wizard by clicking the "Create Cluster" button in the web console.
  2. Choose the version of Hadoop you want to start. In this tutorial, we will use Amazon's distribution of Hadoop which builds on Apache Hadoop 2.4.0. You can also choose an older version but it is not recommended by Amazon. We did not test SpatialHadoop with MapR distribution so it is up to you to choose that version.
  3. In the "Bootstrap Actions" section, add a new bootstrap action, choose "Custom action" and click "Configure and add".
  4. In the name field enter "Install SpatialHadoop", in the S3 location enter "s3://shadoop-emr/install-shadoop.rb" and leave the "Optional arguments" field blank. When you are done, click "Add".

    Hint: Leaving the "Optional arguments" feed blank will automatically install the most recent version of SpatialHadoop. If you would like to install a specific version, enter the download URL of the SpatialHadoop package as an argument. For example, if you would like to install SpatialHadoop 2.2, enter "http://spatialhadoop.cs.umn.edu/downloads/spatialhadoop-2.2.tar.gz" as an argument.
  5. You can just start the cluster without specifying any steps and it will have SpatialHadoop installed on it. If you would also like to run some steps, you can add choose the "Custom JAR" step and click "Configure and add".

    Enter a suitable name to the step and specify the JAR location as "/home/hadoop/spatialhadoop-main.jar". In the "arguments" field, specify the command you would like to run along with any arguments as shown in the figure below.
  6. Finally, you can start the cluster using the "Create cluster" button at the bottom of the page.