[1 min read] Finding similarity between objects #ml #first principles
Since I have promised an under 1 min reading time, let me deliver the punchline upfront — similarity between two objects is most commonly arrived at by the cosine of the angle, cos(θ), between the two corresponding vectors. The values range from -1 (opposite) to +1(the same).
The mathematical representation for cosine similarity, cos(θ), is:
i.e. the dot product divided by the magnitude ( |A|| and ||B|| being the lengths of the two vectors).
Since it's a symmetrical algorithm (i.e. similarity between A and B, is the same as that between B and A), the value for each pair needs to be calculated only once.
Cosine similarity can be used to give movie or book recommendations, for instance, based on other users who have given ratings similar to yours for the same movies or books. It is also used to measure the similarity between documents or sentences as well ( however since term frequency cannot be less than 0, the values in this case range between 0 and 1).
Some of the other popular similarity algorithms are as follows:
I) Jaccard similarity (size of the intersection between the two data sets divided by the size of the union between the same).
II) Pearsons similarity (covariance of the two n-dimensional vectors divided by the product of their standard deviations).
III) Euclidean distance (straight line distance between two points in n-dimensional space).
IV) Overlap similarity (size of the intersection divided by the size of the smaller of the two datasets).
Our time’s up for this post :). But as always, do please write in with your comments or queries. Hope you found the post useful. May you find the 1 :)!