Random thing that I got excited about this morning because I wanted to add really nice search to my personal journal: vector embeddings.
Imagine a 3D Cartesian space, and in it there are words at various (x, y, z) points. Similar words are placed in similar locations. In this space, “king” and “queen” would be close to each other, whereas “beggar” would be far away. This is essentially what a “vector embedding” is, except that vector embeddings typically have a much higher dimensionality than 3D space.
For an example of a dimension, take the relationship between “hot” and “cold.” You might track that as a dimension to judge a word by. How about social vs antisocial? Or fast vs slow? If you rated each word on hundreds of such dimensions, you’d have a single vector in a high dimensional space:
// vector embedding for the word "ice"
[
// hotness
0.0,
// coldness
1.0,
// social
0,
// antisocial
0.3
// fast
0.1,
// slow
0.5
]
// Without comments:
[0.0, 1.0, 0, 0.3, 0.1, 0.5]
That vector space is technically a 6-dimensional hypercube, but it’s easier to visualize vector embeddings as being 3-dimensional.
The placement of words within the hypercube is done via arcane magic. In fact, the dimensions don’t really correlate to anything we could solidly name, unlike my example, and the “words” are actually fragments of words, text, or even bytes – they’re certainly not complete words. That’s why these neural networks understand that “quick” and “quickly” are similar.
Regardless, the end result is that you can take some text, run it through a neural network, and get the position of that text within the hypercube. Then you can see if there’s anything nearby that has the same semantic meaning. That way, when you search “monarch man,” the embeddings will return “king” instead of failing like traditional full-text search, which would require you to search “k,” “ki,” “kin,” or “king.”
Creating the neural networks takes a lot of computing power, but once they’re done, you can download them for your own use in order to 1) convert words or phrases into positions in the hypercube, and 2) search the space for nearby entries instead of searching text for exact matches.
Anyway, it’d be a huge improvement to search on my personal daily journal, so I would like to get it working. Someday, maybe.
Leave a Reply