SpatialJoin

Perform Spatial Join on BigData using Hadoop MapReduce

Sample Density Run

In this example, we are joining a set of points from HDFS with a small set of polygons that is loaded from the distributed cache.

Create polygon sample data into local file system:

$ awk -f data.awk | awk -f polygons.awk > /tmp/polygons.txt

Create points sample data into HDFS:

$ awk -f data.awk | hadoop fs -put - points.txt

Build, package and run the tool:

$ mvn -PDensityTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar points.txt file:///tmp/polygons.txt output

Save the summary output into local file system:

$ hadoop fs -cat output/part-* > /tmp/relate.txt

Convert polygon set into a shapefile using ogr2ogr so it can be viewed in ArcMap and related with the above file for visualization:

$ awk -f featurecollection.awk /tmp/polygons.txt > /tmp/fc.json
$ ogr2ogr -f "ESRI Shapefile" /tmp/polygons /tmp/fc.json

Sample Airport/Route Run

In this example of a map side spatial join, I have a list of airport code pairs forming flight routes between two cities. In addition, I have a list of the airport code, city and location in lat/lon format. The task at hand it count the number of times an airport is overflown by the flight routes.

Place the data into HDFS:

$ hadoop fs -put data/airports.csv airports.csv
$ hadoop fs -put data/routes.csv routes.csv

Build, package and run the tool:

$ mvn -PJoinMapTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar routes.csv output

See the output:

$ hadoop fs -cat output/part-00000 | sort -n -r --key=2 | head -10

Sample Polygon/Polygon Spatial Join

In this example of reduce side spatial join, we will join two polygon sets and return the intersection set.

Create the two polygon sets:

$ awk -f data.awk | hadoop fs -put - data1.txt
$ awk -f data.awk | hadoop fs -put - data2.txt

Build, package and run the tool - here I am specifying a cell size of 5 units:

$ mvn -PJoinRedTool clean package
$ hadoop jar target/SpatialJoin-1.0-SNAPSHOT-job.jar -D com.esri.size=5 data1.txt data2.txt output

To view the result (as long it is small :-), I am using the Esri-Leaflet project.

Create a folder under your web application and extract the data into that folder as GeoJSON formatted files.

$ hadoop fs -cat data1.txt | awk -f geojson.awk -v N=data1 > data1.js
$ hadoop fs -cat data2.txt | awk -f geojson.awk -v N=data2 > data2.js
$ hadoop fs -cat output/part-* | awk -f union.awk > union.js

Copy the file esri-leaflet.js and index.html in the webapp folder to that folder and launch your browser and view the result at http://localhost/myfolder/index.html

Unit Testing

I highly recommend that you look at the test folder - There are two types of unit test cases in there. one that tests the mapper by itself, the reducer by itself and the combination of mapper and reducer. The other type is one that launches a mini cluster that launches a name node, a data node, a job tracker and a task tracker all in one VM. This enable the test case to write some well known data at startup time into HDFS and read the result of the MapReduce job back from HDFS to assert the values.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
data.awk		data.awk
featurecollection.awk		featurecollection.awk
gendata.sh		gendata.sh
geojson.awk		geojson.awk
polygons.awk		polygons.awk
pom.xml		pom.xml
union.awk		union.awk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpatialJoin

Sample Density Run

Sample Airport/Route Run

Sample Polygon/Polygon Spatial Join

Unit Testing

About

Uh oh!

Releases

Packages

ErikHoel/SpatialJoin

Folders and files

Latest commit

History

Repository files navigation

SpatialJoin

Sample Density Run

Sample Airport/Route Run

Sample Polygon/Polygon Spatial Join

Unit Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages