Application: Word Similarity
Application: Word Similarity#
import numpy as np
from rich.pretty import pprint
from rich import print
from sklearn.metrics.pairwise import cosine_similarity
x1 = np.array([1, 0, 7, 13]) # battle
x2 = np.array([114, 80, 62, 82]) # good
x3 = np.array([36, 58, 1, 4]) # fool
x4 = np.array([20, 15, 2, 3]) # wit
print(cosine_similarity([x3], [x4]))
print(cosine_similarity([x1], [x3]))
[[0.92856079]]
[[0.09386806]]
We can do this as well, easier to see.
X = np.array([[1, 0, 7, 13], [114, 80, 62, 89], [36, 58, 1, 4], [20, 15, 2, 3]])
print(X)
[[ 1 0 7 13] [114 80 62 89] [ 36 58 1 4] [ 20 15 2 3]]
cosine_similarity(X)
array([[1. , 0.65267448, 0.09386806, 0.1952947 ],
[0.65267448, 1. , 0.75892858, 0.86817473],
[0.09386806, 0.75892858, 1. , 0.92856079],
[0.1952947 , 0.86817473, 0.92856079, 1. ]])
We see that x1
and x3
are extremely dissimilar, having a cosine similarity of 0.09386806,
while x3
and x4
are extremely similar, having a cosine similarity of 0.92856079.