Intel's Visual Data Management System (VDMS)

This notebook covers how to get started with VDMS as a vector store.

Intel's Visual Data Management System (VDMS) is a storage solution for efficient access of big-”visual”-data that aims to achieve cloud scale by searching for relevant visual data via visual metadata stored as a graph and enabling machine friendly enhancements to visual data for faster access. VDMS is licensed under MIT. For more information on VDMS, visit this page, and find the LangChain API reference here.

VDMS supports:

K nearest neighbor search
Euclidean distance (L2) and inner product (IP)
Libraries for indexing and computing distances: FaissFlat (Default), FaissHNSWFlat, FaissIVFFlat, Flinng, TileDBDense, TileDBSparse
Embeddings for text, images, and video
Vector and metadata searches

Setup

VDMS has server and client components. To setup the server, see the installation instructions or use the docker image. This notebook shows how to use VDMS as a vector store using the docker image.

To access VDMS vector store, you'll need to install the langchain-community package and the VDMS Client Python Module.

%pip install -qU vdms

Note: you may need to restart the kernel to use updated packages.

Initialization

Start VDMS Server

In this example, the VDMS Server is deployed via the publicly available Docker image. Here we start the VDMS server with port 55555 and connect to it using the VDMS client.

!docker run --rm -d -p 55555:55555 --name vdms_vs_test_nb intellabs/vdms:latest

9474de7bc05849faf0b9e545125bdbc060a398d3d8d043e91f5e18455824fb8b

Create Documents

Create documents we can add to vectorstore.

import logging

logging.basicConfig()
logging.getLogger("langchain_community.vectorstores.vdms").setLevel(logging.INFO)

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
    id=1,
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
    id=2,
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
    id=3,
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
    id=4,
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
    id=5,
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
    id=6,
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
    id=7,
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
    id=8,
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
    id=9,
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
    id=10,
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(doc.id) for doc in documents]

API Reference:Document

Embedding Model

We use HuggingFaceEmbeddings for this example as the embedding model.

%pip install -qU langchain-huggingface

from langchain_huggingface import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)
print(
    f"# Embedding Dimensions: {len(embeddings.embed_query('This is a test document.'))}"
)

API Reference:HuggingFaceEmbeddings

Note: you may need to restart the kernel to use updated packages.
# Embedding Dimensions: 768

Add items to vector store

Use the VDMS Client to connect to a VDMS vectorstore using FAISS IndexFlat indexing (default) and Euclidean distance (default) as the distance metric for similarity search.

We can add items to our vector store by using the add_documents function.

from langchain_community.vectorstores import VDMS
from langchain_community.vectorstores.vdms import VDMS_Client

collection_name = "my_collection_faiss_L2"

vdms_client = VDMS_Client(host="localhost", port=55555)

vector_store = VDMS(
    client=vdms_client,
    embedding=embeddings,
    collection_name=collection_name,
    engine="FaissFlat",
    distance_strategy="L2",
)

inserted_ids = vector_store.add_documents(documents=documents, ids=uuids)

API Reference:VDMS | VDMS_Client

INFO:langchain_community.vectorstores.vdms:Descriptor set my_collection_faiss_L2 created

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

Query directly

Similarity search

Performing a simple similarity search can be done as follows:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": ["==", "tweet"]},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

INFO:langchain_community.vectorstores.vdms:VDMS similarity search took 0.0082 seconds
``````output
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

Similarity search with score

If you want to execute a similarity search and receive the corresponding scores you can run:

results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": ["==", "news"]}
)
for res, score in results:
    print(f"* [SIM={score:0.3f}] {res.page_content} [{res.metadata}]\n\n")

* [SIM=0.809] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'langchain_id': '2', 'source': 'news'}]

Search by vector

You can also search by vector:

results = vector_store.similarity_search_by_vector(
    embedding=embeddings.embed_query("I love green eggs and ham!"), k=1
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

INFO:langchain_community.vectorstores.vdms:VDMS similarity search took 0.0043 seconds
``````output
* I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

Here is how to transform your vector store into a retriever and invoke with a simple query and filter.

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1},
)
retriever.invoke("Stealing from the bank is a crime")

INFO:langchain_community.vectorstores.vdms:VDMS similarity search took 0.0044 seconds

[Document(id='4', metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1, "fetch_k": 5},
)
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

INFO:langchain_community.vectorstores.vdms:VDMS similarity search mmr took 0.0045 secs

[Document(metadata={'langchain_id': '4', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

Manage vector store

In addition to interacting with the vectorstore to add items, we can also update and delete items.

Update items in vector store

Now that we have added documents to our vector store, we can update existing documents by using the update_documents function.

updated_document_1 = Document(
    page_content="I had chocolate chip pancakes and fried eggs for breakfast this morning.",
    metadata={"source": "tweet"},
    id=1,
)

updated_document_2 = Document(
    page_content="The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees.",
    metadata={"source": "news"},
    id=2,
)

vector_store.update_document(
    collection_name, document_id=uuids[0], document=updated_document_1
)

# You can also update multiple documents at once
vector_store.update_documents(
    collection_name,
    ids=uuids[:2],
    documents=[updated_document_1, updated_document_2],
)

results = vector_store.get_by_ids(uuids[:2])
for doc in results:
    print(f"* [id: {doc.id}] {doc.page_content} [{doc.metadata}]")

* [id: 1] I had chocolate chip pancakes and fried eggs for breakfast this morning. [{'source': 'tweet'}]
* [id: 2] The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees. [{'source': 'news'}]

Delete items from vector store

We can also delete items from our vector store as follows:

vector_store.delete(ids=[uuids[-1]])

results = vector_store.get_by_ids(uuids)

for doc in results:
    print(f"* [id: {doc.id}] {doc.page_content} [{doc.metadata}]")

* [id: 1] I had chocolate chip pancakes and fried eggs for breakfast this morning. [{'source': 'tweet'}]
* [id: 2] The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees. [{'source': 'news'}]
* [id: 3] Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* [id: 4] Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]
* [id: 5] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
* [id: 6] Is the new iPhone worth the price? Read this review to find out. [{'source': 'website'}]
* [id: 7] The top 10 soccer players in the world right now. [{'source': 'website'}]
* [id: 8] LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]
* [id: 9] The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]

Add Existing IDs

IDs should be unique, therefore, if they already exist the data is overwritten. In previous step, we deleted id '10'. Next we try to insert ALL data again but since other ids exist, they are skipped and only id '10' is inserted. This is done by setting the delete_existing keyword to False.

inserted_ids = vector_store.add_documents(
    documents=documents, ids=uuids, delete_existing=False
)
print(f"ids inserted: {inserted_ids}")

INFO:langchain_community.vectorstores.vdms:[!] Embeddings skipped for following ids because already exists: ['1', '2', '3', '4', '5', '6', '7', '8', '9']
Can retry with 'delete_existing' set to True
``````output
ids inserted: ['10']

Here we attempt to insert all data again but nothing is inserted.

inserted_ids = vector_store.add_documents(
    documents=documents, ids=uuids, delete_existing=False
)
print(f"ids inserted: {inserted_ids}")

INFO:langchain_community.vectorstores.vdms:[!] Embeddings skipped for following ids because already exists: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
Can retry with 'delete_existing' set to True
``````output
ids inserted: []

Now to delete existing ids and re-insert data. No need to set delete_existing as this is the default behavior.

inserted_ids = vector_store.add_documents(documents=documents, ids=uuids)
print(f"ids inserted: {inserted_ids}")

ids inserted: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

Similarity Search using other engines

VDMS supports various libraries for indexing and computing distances: FaissFlat (Default), FaissHNSWFlat, FaissIVFFlat, Flinng, TileDBDense, and TileDBSparse. By default, the vectorstore uses FaissFlat. Below we show a few examples using the other engines.

Load Sample Document

Here we load the most recent State of the Union Address and split the document into chunks. Additional metadata is also generated for each document chunk.

from langchain_community.document_loaders.text import TextLoader
from langchain_text_splitters.character import CharacterTextSplitter

# Load the document and split it into chunks
document_path = "../../how_to/state_of_the_union.txt"
raw_documents = TextLoader(document_path).load()

# Split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(raw_documents)
ids = []
for doc_idx, doc in enumerate(docs):
    ids.append(int(doc_idx + 1))

    # Synthetic metadata
    docs[doc_idx].metadata["id"] = int(doc_idx + 1)
    docs[doc_idx].metadata["page_number"] = int(doc_idx + 1)
    docs[doc_idx].metadata["president_included"] = (
        "president" in doc.page_content.lower()
    )
print(f"# Documents: {len(docs)}")

API Reference:TextLoader | CharacterTextSplitter

# Documents: 42

Similarity Search using Faiss HNSWFlat and Euclidean Distance

Here, we add the documents to VDMS using Faiss IndexHNSWFlat indexing and L2 as the distance metric for similarity search. We search for three documents (k=3) related to the query What did the president say about Ketanji Brown Jackson and also return the score along with the document.

db_FaissHNSWFlat = VDMS.from_documents(
    docs,
    client=vdms_client,
    ids=ids,
    collection_name="my_collection_FaissHNSWFlat_L2",
    embedding=embeddings,
    engine="FaissHNSWFlat",
    distance_strategy="L2",
)
# Query
k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_FaissHNSWFlat.similarity_search_with_score(query, k=k, filter=None)

for res, score in docs_with_score:
    print(f"* [SIM={score:0.3f}] {res.page_content} [{res.metadata}]\n\n")

INFO:langchain_community.vectorstores.vdms:Descriptor set my_collection_FaissHNSWFlat_L2 created
``````output
* [SIM=1.203] Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. [{'id': 32, 'langchain_id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=1.495] As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. 

It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. 

And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. 

Third, support our veterans. 

Veterans are the best of us. 

I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. 

My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  

Our troops in Iraq and Afghanistan faced many dangers. [{'id': 37, 'langchain_id': '37', 'page_number': 37, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=1.501] A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. 

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. [{'id': 33, 'langchain_id': '33', 'page_number': 33, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

Similarity Search using Faiss IVFFlat and Inner Product (IP) Distance

We add the documents to VDMS using Faiss IndexIVFFlat indexing and IP as the distance metric for similarity search. We search for three documents (k=3) related to the query What did the president say about Ketanji Brown Jackson and also return the score along with the document.

db_FaissIVFFlat = VDMS.from_documents(
    docs,
    client=vdms_client,
    ids=ids,
    collection_name="my_collection_FaissIVFFlat_IP",
    embedding=embeddings,
    engine="FaissIVFFlat",
    distance_strategy="IP",
)

k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_FaissIVFFlat.similarity_search_with_score(query, k=k, filter=None)
for res, score in docs_with_score:
    print(f"* [SIM={score:0.3f}] {res.page_content} [{res.metadata}]\n\n")

INFO:langchain_community.vectorstores.vdms:Descriptor set my_collection_FaissIVFFlat_IP created
``````output
* [SIM=0.164] And built the strongest, freest, and most prosperous nation the world has ever known. 

Now is the hour. 

Our moment of responsibility. 

Our test of resolve and conscience, of history itself. 

It is in this moment that our character is formed. Our purpose is found. Our future is forged. 

Well I know this nation.  

We will meet the test. 

To protect freedom and liberty, to expand fairness and opportunity. 

We will save democracy. 

As hard as these times have been, I am more optimistic about America today than I have been my whole life. 

Because I see the future that is within our grasp. 

Because I know there is simply nothing beyond our capacity. 

We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. 

The only nation that can be defined by a single word: possibilities. 

So on this night, in our 245th year as a nation, I have come to report on the State of the Union. [{'id': 41, 'langchain_id': '41', 'page_number': 41, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=0.159] He and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make.  

But drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshua’s mom. 

Imagine what it’s like to look at your child who needs insulin and have no idea how you’re going to pay for it.  

What it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be. 

Joshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy.  

For Joshua, and for the 200,000 other young people with Type 1 diabetes, let’s cap the cost of insulin at $35 a month so everyone can afford it.  

Drug companies will still do very well. And while we’re at it let Medicare negotiate lower prices for prescription drugs, like the VA already does. [{'id': 18, 'langchain_id': '18', 'page_number': 18, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=0.138] And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value. 

The Russian stock market has lost 40% of its value and trading remains suspended. Russia’s economy is reeling and Putin alone is to blame. 

Together with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance. 

We are giving more than $1 Billion in direct assistance to Ukraine. 

And we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering.  

Let me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.  

Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west. [{'id': 5, 'langchain_id': '5', 'page_number': 5, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

Similarity Search using FLINNG and IP Distance

In this section, we add the documents to VDMS using Filters to Identify Near-Neighbor Groups (FLINNG) indexing and IP as the distance metric for similarity search. We search for three documents (k=3) related to the query What did the president say about Ketanji Brown Jackson and also return the score along with the document.

db_Flinng = VDMS.from_documents(
    docs,
    client=vdms_client,
    ids=ids,
    collection_name="my_collection_Flinng_IP",
    embedding=embeddings,
    engine="Flinng",
    distance_strategy="IP",
)
# Query
k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_Flinng.similarity_search_with_score(query, k=k, filter=None)
for res, score in docs_with_score:
    print(f"* [SIM={score:0.3f}] {res.page_content} [{res.metadata}]\n\n")

INFO:langchain_community.vectorstores.vdms:Descriptor set my_collection_Flinng_IP created
``````output
* [SIM=0.000] Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. [{'id': 1, 'langchain_id': '1', 'page_number': 1, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=0.000] Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. [{'id': 1, 'langchain_id': '1', 'page_number': 1, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=0.000] Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. [{'id': 1, 'langchain_id': '1', 'page_number': 1, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}]

Similarity Search using TileDBDense and Euclidean Distance

In this section, we add the documents to VDMS using TileDB Dense indexing and L2 as the distance metric for similarity search. We search for three documents (k=3) related to the query What did the president say about Ketanji Brown Jackson and also return the score along with the document.

db_tiledbD = VDMS.from_documents(
    docs,
    client=vdms_client,
    ids=ids,
    collection_name="my_collection_tiledbD_L2",
    embedding=embeddings,
    engine="TileDBDense",
    distance_strategy="L2",
)

k = 3
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db_tiledbD.similarity_search_with_score(query, k=k, filter=None)
for res, score in docs_with_score:
    print(f"* [SIM={score:0.3f}] {res.page_content} [{res.metadata}]\n\n")

INFO:langchain_community.vectorstores.vdms:Descriptor set my_collection_tiledbD_L2 created
``````output
* [SIM=1.203] Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. [{'id': 32, 'langchain_id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=1.495] As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. 

It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. 

And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. 

Third, support our veterans. 

Veterans are the best of us. 

I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. 

My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  

Our troops in Iraq and Afghanistan faced many dangers. [{'id': 37, 'langchain_id': '37', 'page_number': 37, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

* [SIM=1.501] A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. 

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. 

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. 

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders. [{'id': 33, 'langchain_id': '33', 'page_number': 33, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}]

Filtering on metadata

It can be helpful to narrow down the collection before working with it.

For example, collections can be filtered on metadata using the get_by_constraints method. A dictionary is used to filter metadata. Here we retrieve the document where langchain_id = "2" and remove it from the vector store.

NOTE: id was generated as additional metadata as an integer while langchain_id (the internal ID) is an unique string for each entry.

results = db_FaissIVFFlat.get_by_constraints(
    db_FaissIVFFlat.collection_name,
    limit=1,
    include=["metadata", "embeddings"],
    constraints={"id": ["==", 2]},
)

# Delete id=2
db_FaissIVFFlat.delete(collection_name=db_FaissIVFFlat.collection_name, ids=[2])

print("Deleted entry:")
for doc in results:
    print(f"Content:\n{doc.page_content}\n\nMetadata:\n{doc.metadata}\n\n")

Deleted entry:
Content:
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. 

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. 

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. 

Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. 

Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.   

They keep moving.   

And the costs and the threats to America and the world keep rising.   

That’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2. 

The United States is a member along with 29 other nations. 

It matters. American diplomacy matters. American resolve matters.

Metadata:
{'id': 2, 'page_number': 2, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}

Here we use id to filter for a range of IDs since it is an integer.

results = db_FaissIVFFlat.get_by_constraints(
    db_FaissIVFFlat.collection_name,
    limit=1,
    include=["metadata", "embeddings"],
    constraints={"id": [">", 1, "<=", 3]},
)

for doc in results:
    print(f"Content:\n{doc.page_content}\n\nMetadata:\n{doc.metadata}\n\n")

Content:
Putin’s latest attack on Ukraine was premeditated and unprovoked. 

He rejected repeated efforts at diplomacy. 

He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready.  Here is what we did.   

We prepared extensively and carefully. 

We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. 

I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.  

We countered Russia’s lies with truth.   

And now that he has acted the free world is holding him accountable. 

Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.

Metadata:
{'id': 3, 'page_number': 3, 'president_included': False, 'source': '../../how_to/state_of_the_union.txt'}

!docker kill vdms_vs_test_nb

vdms_vs_test_nb

Usage for retrieval-augmented generation

For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:

API reference

For detailed documentation of all VDMS vector store features and configurations head to the API reference: https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.vdms.VDMS.html

Vector store conceptual guide
Vector store how-to guides

Setup​

Initialization​

Start VDMS Server​

Create Documents​

Embedding Model​

Add items to vector store​

Query vector store​

Query directly​

Similarity search​

Similarity search with score​

Search by vector​

Query by turning into retriever​

Manage vector store​

Update items in vector store​

Delete items from vector store​

Add Existing IDs​

Similarity Search using other engines​

Load Sample Document​

Similarity Search using Faiss HNSWFlat and Euclidean Distance​

Similarity Search using Faiss IVFFlat and Inner Product (IP) Distance​

Similarity Search using FLINNG and IP Distance​

Similarity Search using TileDBDense and Euclidean Distance​

Filtering on metadata​

Usage for retrieval-augmented generation​

API reference​

Related​

Was this page helpful?