ChromaDB is an open-source vector database designed to store and query embeddings, documents, and metadata for applications utilizing large language models (LLMs). It simplifies the development of LLM-powered applications by providing a unified platform for managing and retrieving vector data.
Configuration Options
Query Settings
- 
Max Query Results: Set the maximum number of results to retrieve during memory recall.
- Default: 
4 - Note: Higher values may cause items to be dropped due to token budget constraints.
 
 - Default: 
 - 
Max Query Distance: Define the maximum distance threshold for results, based on the HNSW distance metric.
- Default: 
0.85 - Note: The range depends on the space metric (e.g., cosine ranges between 
0and2) 
 - Default: 
 
Memory Settings
- 
Prefill Memory Window: Automatically load the top entries (by weight or date) into live memory when starting a session.
- Benefit: Increases the chances of recalling entries without triggers.
 - Risk: May introduce noise into memory recall.
 
 - 
Max Memory Window Entries: Limit the number of memory entries stored in the active memory window.
- Default: 
12 - Note: The actual number may be lower if the token count exceeds the available memory token window.
 
 - Default: 
 - 
Expire Memories After: Specify how many messages to keep memories in the active window before they expire.
- 
Example:
5Note: Set to
0to keep memories indefinitely. 
 - 
 - 
Embedding Model: Select the embedding model for representing data.
- 
Default:
all-MiniLM-L6-v2(small size, 80MB, very fast).
 - 
Note: Embeddings are used to represent data in an AI-native format, perfect for tools and algorithms.
 - 
Learn more:
 
 - 
 - 
Use Cuda: Enable GPU usage for faster performance. If disabled, the CPU will be used instead.
 
HNSW (Indexing and Search)
- 
HNSW Space: The distance metric used for the HNSW index.
- Default: 
Cosine - Note: The space metric cannot be changed after the index is created. ChromaDB’s default is 
l2 
 - Default: 
 - 
HNSW Construction EF: Controls the number of neighbors explored when adding new vectors.
- Default: 
200 - Impact: Higher values result in better results but increase memory consumption.
 - ChromaDB Default: 
100 
 - Default: 
 - 
HNSW Search EF: Controls the number of neighbors explored when searching.
- Default: 
20 - Impact: Increasing this value improves search quality but requires more memory.
 - ChromaDB Default: 
10 
 - Default: 
 - 
HNSW M: Sets the maximum number of neighbor connections (
M) for each newly inserted vector.- Default: 
25 - Impact: Higher values create a more densely connected graph, leading to slower but more accurate searches with increased memory usage.
 - ChromaDB Default: 
16 
 - Default: