Embedding
Simple Definition
An embedding is a way of converting text (or other data) into a list of numbers that represents its meaning. Words or sentences with similar meanings end up with similar numbers — they’re “close” to each other in mathematical space.
This turns the fuzzy concept of “meaning” into something computers can calculate and compare.
A Simple Analogy
Imagine placing every word in a city on a map. Similar words would be near each other: “happy” and “joyful” would be close together, while “happy” and “car” would be far apart. Embeddings do this, but in hundreds or thousands of dimensions instead of just two.
Why Embeddings Matter
Embeddings make it possible to:
- Search by meaning — find documents about “vehicle safety” even if they use words like “automobile” or “car accident” instead of your exact search terms
- Build recommendation systems — find content similar to what a user liked
- Power RAG systems — retrieve the most relevant documents to include in an AI’s context
- Detect similarity — identify duplicate content or near-duplicate questions
How They’re Created
Embedding models (like OpenAI’s text-embedding-ada or Sentence Transformers) are trained to produce these numerical representations. You pass text in, get a list of ~1,000–3,000 numbers out. These numbers are then stored in vector databases for fast similarity search.
Practical Use
When you use a tool with semantic search or “chat with your documents” functionality, embeddings are working under the hood — converting both your query and all the documents into numbers, then finding the closest matches.
Related Terms
- Vector Database — stores and searches embeddings efficiently
- RAG — uses embeddings to retrieve relevant context for AI responses
- LLM — often used alongside embedding models in AI applications
See AI terms in action
Browse practical AI workflows that use the concepts in this glossary.
Last updated: