Sunday, May 4, 2014

Spatially Enabling In-Memory BigData Stores

I deeply believe that the future of BigData stores and processing will be driven by GPUs and purely based on distributed InMemory engines that is backed by something resilient to hardware failure like HDFS.
HBase, Accumulo, Cassandra depend heavily on their in-memory capabilities for their performance. And when it comes to processing, SQL is still King….MemSQL is combining both - pretty impressive.
However, ALL lack something that is so important in today’s BigData world and that is true spatial storage, index and processing of native points, lines and polygons. SpaceCurve is doing great progress on that front.
A lot of smart people have taken advantage of the native lexicographical indexing of these key value stores and used geohash to save, index, and search spatial elements, and have solved the Z-order range search. Though these are great implementation, I always thought that the end did not justify the means. There is a need for a true and effective BigData spatial capabilities.
I’ve been a big fan of Hazelcast for quite some time now and was always impressed by their technology. In their latest implementation, they have added a MapReduce API, in such that now you can send programs to data - very cool !
But…like the others, they lack the spatial aspect when it comes my world. So…here is a set of small tweaks that truly spatially enables this in-memory BigData engine. I’ve used the MapReduce API and the spatial index in an example to visualize hotspot conflict in Africa.

Like usual, all the source code can be downloaded from here.