Sunday, April 16, 2017

On Machine Learning and General Path Recognition

This is part II of my journey back into SOMs. Actually this journey started with the below picture:


It is a map of AIS broadcast points around the harbor of Miami, FL. Now we, sentient creatures, when we look at this map we can quickly and clearly see patterns formed by the points. There is a sequence of points that start from the harbor and go northeast.  There is a sequence of points that starts at the harbor and go south southeast. And there are a plenty of north south paths, some are very close to the shore, others are on the "edge" to the east. And there are path in the "middle".

Wouldn't it be wonderful if the Machine can see these pattern and formulate the general paths? That is actually what started this journey. I needed an unsupervised way for the Machine to recognize the patterns and emit the paths. I'm sure there are multiple ways to solve this, but I remembered that a while back I used Self Organizing Maps due to their simplicity and crucially for belonging to a class of unsupervised machine learning algorithms. So, in Part I, I reacquainted myself with SOMs and in this part, I completed the journey by showing the paths.

However faithful reader, I have not been totally honest with you. Please forgive me. This is part III in this journey. Along the way, I diverted a bit, as I needed a way to assemble tracks from targets. There exist a hidden gem in this project. The PathFinder application is an important stop on this journey, as in addition to assembling tracks from targets, it quantizes the path into grid cells. The quantization of paths is the linchpin between the raw targets points and the unsupervised path detection.

The below picture will help in my explanation:


If a virtual grid is overlaid on the map (the grid cells in the above map are coarse on purpose for illustration purposes), then a linear vector that represents a path can be composes by scanning the grid cells from left to right and then from top to bottom.  The existence of targets in a cell is the binary value of the element in the vector.  So in the above case, the path will be represented by the vector [0,0,0,0,1,0,0,0,1,1,1,1,0,0,0,1]. Side note: In this implementation the vector is composed of binary values, however, a vector with real numbers can be composed where the element value is proportional to the number of targets. In addition, the cell values can "bleed" to neighboring cells in say a gaussian way for better path recognition. Will have to come back to this one day. A Master's thesis can be made out this.

Now that we can compose a set of vectors from a set of targets, we can train the SOM with these vectors. The below figure is the visual representation of a 3x3 SOM result. Each sub map is a visual representation of the settled weights of a SOM node where each node weight is a linear representation of a quantized grid as described above.


We can see the path patterns that have self organized, and they do reflect what we have implicitly seen in the first map as humans. I highlighted in red the cells in each map with the most target associations thus forming a path. Isn't it amazing ?

Like usual, you can download all the source code for this from here.

Sunday, April 2, 2017

On Machine Learning with Self Organizing Maps

Self Organizing Map (SOM) is a form of Artificial Neural Network (ANN) belonging to a class of Machine Learning. AI Junkie has a GREAT tutorial about it. What I like about SOMs is that they belong to a class of unsupervised learning models and they hold true to the first law of geography.

"Everything is related to everything else, but near things are more related than distant things." - Tobler

I encountered them and used them over 20 years ago, and since AI/ML is the hottest topic these days, I'm reacquainting myself with them. There are plenty of SOM libraries, but I learn (or in this case re-learn) by doing.  This project is my learning journey in implementing SOMs and "Sparkyfing" them.

The following is a sample output of the obligatory RGB classifier, where a million random RGB triples are organized by a Spark based SOM into a 10x10 square lattice:


And the following is a sample solution to a TSP using SOM:


Like usual, all the source code can be found here.