Monday, March 27, 2017

ArcGIS, Spark and Alluxio Integration

There exist a plethora of backend distributed data stores. I am always using S3 or Hadoop HDFS or OpenStack Swift with my GIS applications to read from these backends geospatial data or to save into these backends my data. Some of these distributed data stores are not natively supported by the ArcGIS platform. However, the platform can be extended with ArcPy to handle these situations. Depending on the data store, I will have to use a different API (mostly Python based) to read and write geospatial information. This is where Alluxio comes in very handy. It provides an abstract layer between the application and the data store and (here is the best part), it caches this information in memory in a distributed and resilient-to-failure manner. So, at the application level, the code to access the data is invariant. On the backend, I can configure Alluxio to use either S3, HDFS or SWIFT. Finally, the advent of a REST endpoint in Alluxio eases the integration with ArcGIS to write, read and visualize Geospatial data.

img-alternative-text
img-alternative-text

Like usual, all the source code for this integration can be found here.