Application: Word Similarity#

import numpy as np
from rich.pretty import pprint
from rich import print
from sklearn.metrics.pairwise import cosine_similarity
x1 = np.array([1, 0, 7, 13]) # battle
x2 = np.array([114, 80, 62, 82]) # good
x3 = np.array([36, 58, 1, 4]) # fool
x4 = np.array([20, 15, 2, 3]) # wit

print(cosine_similarity([x3], [x4]))
print(cosine_similarity([x1], [x3]))
[[0.92856079]]
[[0.09386806]]

We can do this as well, easier to see.

X = np.array([[1, 0, 7, 13], [114, 80, 62, 89], [36, 58, 1, 4], [20, 15, 2, 3]])
print(X)
[[  1   0   7  13]
 [114  80  62  89]
 [ 36  58   1   4]
 [ 20  15   2   3]]
cosine_similarity(X)
array([[1.        , 0.65267448, 0.09386806, 0.1952947 ],
       [0.65267448, 1.        , 0.75892858, 0.86817473],
       [0.09386806, 0.75892858, 1.        , 0.92856079],
       [0.1952947 , 0.86817473, 0.92856079, 1.        ]])

We see that x1 and x3 are extremely dissimilar, having a cosine similarity of 0.09386806, while x3 and x4 are extremely similar, having a cosine similarity of 0.92856079.