Hybrid Search - Quickstart on Vespa Cloud
This is the same guide as getting-started-pyvespa, deploying to Vespa Cloud.
Refer to troubleshooting for any problem when running this guide.
Pre-requisite: Create a tenant at cloud.vespa.ai, save the tenant name.
Install
Install pyvespa >= 0.38 and the Vespa CLI. The Vespa CLI is used for data and control plane key management (Vespa Cloud Security Guide).
[1]:
!pip3 install pyvespa
Install the Vespa CLI using homebrew:
[1]:
!brew install vespa-cli
Alternatively, if running in Colab, download the Vespa CLI from GitHub:
[1]:
import os
import requests
res = requests.get(
url="https://api.github.com/repos/vespa-engine/vespa/releases/latest"
).json()
os.environ["VERSION"] = res["tag_name"].replace("v", "")
!curl -fsSL https://github.com/vespa-engine/vespa/releases/download/v${VERSION}/vespa-cli_${VERSION}_linux_amd64.tar.gz | tar -zxf -
!ln -sf /content/vespa-cli_${VERSION}_linux_amd64/bin/vespa /bin/vespa
Configure data-plane security
Create Vespa Cloud data-plane mTLS cert/key-pair. This mutual certificate pair is used to talk to your Vespa cloud endpoints. See Vespa Cloud Security Guide.
We save the paths to the credentials, for later dataplane access without using pyvespa APIs - see example at the end of this notebook.
[1]:
import os
os.environ["TENANT_NAME"] = "vespa-team" # Replace with your tenant name
application = "hybridsearch"
vespa_cli_command = (
f'vespa config set application {os.environ["TENANT_NAME"]}.{application}'
)
!vespa config set target cloud
!{vespa_cli_command}
!vespa auth cert -N
[1]:
from os.path import exists
from pathlib import Path
cert_path = (
Path.home()
/ ".vespa"
/ f"{os.environ['TENANT_NAME']}.{application}.default/data-plane-public-cert.pem"
)
key_path = (
Path.home()
/ ".vespa"
/ f"{os.environ['TENANT_NAME']}.{application}.default/data-plane-private-key.pem"
)
if not exists(cert_path) or not exists(key_path):
print(
"ERROR: set the correct paths to security credentials. Correct paths above and rerun until you do not see this error"
)
Note that the subsequent deploy-call below will add data-plane-public-cert.pem
to the application before deploying it to Vespa Cloud, so that you have access to both the private key and the public certificate, while Vespa Cloud only knows the public certificate.
Configure control-plane security
Authenticate to generate a tenant level control-plane API key for deploying the applications to Vespa Cloud, and save the path to it.
Warning:The generated tenant api key must be added in the Vespa Console before attempting to deploy the application.
The following step will print the following message:
To use this key in Vespa Cloud click 'Add custom key' at
https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys
and paste the entire public key including the BEGIN and END lines.
[1]:
!vespa auth api-key
from pathlib import Path
api_key_path = Path.home() / ".vespa" / f"{os.environ['TENANT_NAME']}.api-key.pem"
Follow the instrauctions from the output above and add the control-plane key in the console at https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys
(replace TENANT_NAME with your tenant name).
Create an application package
The application package has all the Vespa configuration files - create one from scratch:
[1]:
from vespa.package import (
ApplicationPackage,
Field,
Schema,
Document,
HNSW,
RankProfile,
Component,
Parameter,
FieldSet,
GlobalPhaseRanking,
Function,
)
package = ApplicationPackage(
name=application,
schema=[
Schema(
name="doc",
document=Document(
fields=[
Field(name="id", type="string", indexing=["summary"]),
Field(
name="title",
type="string",
indexing=["index", "summary"],
index="enable-bm25",
),
Field(
name="body",
type="string",
indexing=["index", "summary"],
index="enable-bm25",
bolding=True,
),
Field(
name="embedding",
type="tensor<float>(x[384])",
indexing=[
'input title . " " . input body',
"embed",
"index",
"attribute",
],
ann=HNSW(distance_metric="angular"),
is_document_field=False,
),
]
),
fieldsets=[FieldSet(name="default", fields=["title", "body"])],
rank_profiles=[
RankProfile(
name="bm25",
inputs=[("query(q)", "tensor<float>(x[384])")],
functions=[
Function(name="bm25sum", expression="bm25(title) + bm25(body)")
],
first_phase="bm25sum",
),
RankProfile(
name="semantic",
inputs=[("query(q)", "tensor<float>(x[384])")],
first_phase="closeness(field, embedding)",
),
RankProfile(
name="fusion",
inherits="bm25",
inputs=[("query(q)", "tensor<float>(x[384])")],
first_phase="closeness(field, embedding)",
global_phase=GlobalPhaseRanking(
expression="reciprocal_rank_fusion(bm25sum, closeness(field, embedding))",
rerank_count=1000,
),
),
],
)
],
components=[
Component(
id="e5",
type="hugging-face-embedder",
parameters=[
Parameter(
"transformer-model",
{
"url": "https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx"
},
),
Parameter(
"tokenizer-model",
{
"url": "https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json"
},
),
],
)
],
)
Note that the name cannot have -
or _
.
Deploy to Vespa Cloud
The app is now defined and ready to deploy to Vespa Cloud.
Deploy package
to Vespa Cloud, by creating an instance of VespaCloud:
[1]:
from vespa.deployment import VespaCloud
def read_secret():
"""Read the API key from the environment variable. This is
only used for CI/CD purposes."""
t = os.getenv("VESPA_TEAM_API_KEY")
if t:
return t.replace(r"\n", "\n")
else:
return t
vespa_cloud = VespaCloud(
tenant=os.environ["TENANT_NAME"],
application=application,
key_content=read_secret() if read_secret() else None,
key_location=api_key_path,
application_package=package,
)
The following will upload the application package to Vespa Cloud Dev Zone (aws-us-east-1c
), read more about Vespa Zones. The Vespa Cloud Dev Zone is considered as a sandbox environment where resources are down-scaled and idle deployments are expired automatically. For information about production deployments, see the following
example.
Note: Deployments to dev and perf expire after 7 days of inactivity, i.e., 7 days after running deploy. This applies to all plans, not only the Free Trial. Use the Vespa Console to extend the expiry period, or redeploy the application to add 7 more days.
[1]:
app = vespa_cloud.deploy()
If the deployment failed, it is possible you forgot to add the key in the Vespa Cloud Console in the vespa auth api-key
step above.
If you can authenticate, you should see lines like the following
Deployment started in run 1 of dev-aws-us-east-1c for mytenant.hybridsearch.
The deployment takes a few minutes the first time while Vespa Cloud sets up the resources for your Vespa application
app
now holds a reference to a Vespa instance. We can access the mTLS protected endpoint name using the control-plane (vespa_cloud) instance. This endpoint we can query and feed to (data plane access) using the mTLS certificate generated in previous steps.
[1]:
endpoint = vespa_cloud.get_mtls_endpoint()
endpoint
Feeding documents to Vespa
In this example we use the HF Datasets library to stream the BeIR/nfcorpus dataset and index in our newly deployed Vespa instance. Read more about the NFCorpus:
NFCorpus is a full-text English retrieval data set for Medical Information Retrieval.
The following uses the stream option of datasets to stream the data without downloading all the contents locally. The map
functionality allows us to convert the dataset fields into the expected feed format for pyvespa
which expects a dict with the keys id
and fields
:
{ "id": "vespa-document-id", "fields": {"vespa_field": "vespa-field-value"}}
[1]:
from datasets import load_dataset
dataset = load_dataset("BeIR/nfcorpus", "corpus", split="corpus", streaming=True)
vespa_feed = dataset.map(
lambda x: {
"id": x["_id"],
"fields": {"title": x["title"], "body": x["text"], "id": x["_id"]},
}
)
Now we can feed to Vespa using feed_iterable
which accepts any Iterable
and an optional callback function where we can check the outcome of each operation. The application is configured to use embedding functionality, that produce a vector embedding using a concatenation of the title and the body input fields. This step is resource intensive.
Read more about embedding inference in Vespa in the Accelerating Transformer-based Embedding Retrieval with Vespa blog post.
Default node resources in Vespa Cloud have 2 v-cpu for the Dev Zone.
[1]:
from vespa.io import VespaResponse, VespaQueryResponse
def callback(response: VespaResponse, id: str):
if not response.is_successful():
print(f"Error when feeding document {id}: {response.get_json()}")
app.feed_iterable(vespa_feed, schema="doc", namespace="tutorial", callback=callback)
Querying Vespa
Using the Vespa Query language we can query the indexed data.
Using a context manager
with app.syncio() as session
to handle connection pooling (best practices)The query method accepts any valid Vespa query api parameter in
**kwargs
Vespa api parameter names that contains
.
must be sent asdict
parameters in thebody
method argument
The following searches for How Fruits and Vegetables Can Treat Asthma?
using different retrieval and ranking strategies.
Query the text search app using the Vespa Query language by sending the parameters to the body argument of Vespa.query.
First we define a simple routine that will return a dataframe of the results for prettier display in the notebook.
[1]:
import pandas as pd
def display_hits_as_df(response: VespaQueryResponse, fields) -> pd.DataFrame:
records = []
for hit in response.hits:
record = {}
for field in fields:
record[field] = hit["fields"][field]
records.append(record)
return pd.DataFrame(records)
Plain Keyword search
The following uses plain keyword search functionality with bm25 ranking, the bm25
rank-profile was configured in the application package to use a linear combination of the bm25 score of the query terms against the title and the body field.
[1]:
with app.syncio(connections=1) as session:
query = "How Fruits and Vegetables Can Treat Asthma?"
response: VespaQueryResponse = session.query(
yql="select * from sources * where userQuery() limit 5",
query=query,
ranking="bm25",
)
assert response.is_successful()
print(display_hits_as_df(response, ["id", "title"]))
Plain Semantic Search
The following uses dense vector representations of the query and the document and matching is performed and accelerated by Vespa’s support for approximate nearest neighbor search. The vector embedding representation of the text is obtained using Vespa’s embedder functionality.
[1]:
with app.syncio(connections=1) as session:
query = "How Fruits and Vegetables Can Treat Asthma?"
response: VespaQueryResponse = session.query(
yql="select * from sources * where ({targetHits:5}nearestNeighbor(embedding,q)) limit 5",
query=query,
ranking="semantic",
body={"input.query(q)": f"embed({query})"},
)
assert response.is_successful()
print(display_hits_as_df(response, ["id", "title"]))
Hybrid Search
This is one approach to combine the two retrieval strategies and where we use Vespa’s support for cross-hits feature normalization and reciprocal rank fusion. This functionality is exposed in the context of global
re-ranking, after the distributed query retrieval execution which might span 1000s of nodes.
Hybrid search with the OR query operator
This combines the two methods using logical disjunction (OR). Note that the first-phase expression in our fusion
expression is only using the semantic score, this because usually semantic search provides better recall than sparse keyword search alone.
[1]:
with app.syncio(connections=1) as session:
query = "How Fruits and Vegetables Can Treat Asthma?"
response: VespaQueryResponse = session.query(
yql="select * from sources * where userQuery() or ({targetHits:1000}nearestNeighbor(embedding,q)) limit 5",
query=query,
ranking="fusion",
body={"input.query(q)": f"embed({query})"},
)
assert response.is_successful()
print(display_hits_as_df(response, ["id", "title"]))
Hybrid search with the RANK query operator
This combines the two methods using the rank query operator. In this case we express that we want to retrieve the top-1000 documents using vector search, and then have sparse features like BM25 calculated as well (second operand of the rank operator). Finally the hits are re-ranked using the reciprocal rank fusion
[1]:
with app.syncio(connections=1) as session:
query = "How Fruits and Vegetables Can Treat Asthma?"
response: VespaQueryResponse = session.query(
yql="select * from sources * where rank({targetHits:1000}nearestNeighbor(embedding,q), userQuery()) limit 5",
query=query,
ranking="fusion",
body={"input.query(q)": f"embed({query})"},
)
assert response.is_successful()
print(display_hits_as_df(response, ["id", "title"]))
Hybrid search with filters
In this example we add another query term to the yql, restricting the nearest neighbor search to only consider documents that have vegetable in the title.
[1]:
with app.syncio(connections=1) as session:
query = "How Fruits and Vegetables Can Treat Asthma?"
response: VespaQueryResponse = session.query(
yql='select * from sources * where title contains "vegetable" and rank({targetHits:1000}nearestNeighbor(embedding,q), userQuery()) limit 5',
query=query,
ranking="fusion",
body={"input.query(q)": f"embed({query})"},
)
assert response.is_successful()
print(display_hits_as_df(response, ["id", "title"]))
Next steps
This is just an intro into the capabilities of Vespa and pyvespa. Browse the site to learn more about schemas, feeding and queries - find more complex applications in examples.
Example: Document operations using cert/key pair
Above, we deployed to Vespa Cloud, and as part of that, generated a data-plane mTLS cert/key pair.
This pair can be used to access the dataplane for reads/writes to documents and running queries from many different clients. The following demonstrates that using the requests
library.
Set up a dataplane connection using the cert/key pair:
[1]:
import requests
session = requests.Session()
session.cert = (cert_path, key_path)
Get a document from the endpoint returned when we deployed to Vespa Cloud above. PyVespa wraps the Vespa document api internally and in these examples we use the document api directly, but with the mTLS key/cert pair we used when deploying the app.
[1]:
url = "{0}/document/v1/{1}/{2}/docid/{3}".format(endpoint, "tutorial", "doc", "MED-10")
doc = session.get(url).json()
doc
Update the title and post the new version:
[1]:
doc["fields"]["title"] = "Can you eat lobster?"
response = session.post(url, json=doc).json()
response
Get the doc again to see the new title:
[1]:
doc = session.get(url).json()
doc
Example: Reconnect pyvespa using cert/key pair
Above, we stored the dataplane credentials for later use. Deployment of an application usually happens when the schema changes, whereas accessing the dataplane is for document updates and user queries.
One only needs to know the endpoint and the cert/key pair to enable a connection to a Vespa Cloud application:
[1]:
# cert_path = "/Users/me/.vespa/mytenant.hybridsearch.default/data-plane-public-cert.pem"
# key_path = "/Users/me/.vespa/mytenant.hybridsearch.default/data-plane-private-key.pem"
from vespa.application import Vespa
the_app = Vespa(endpoint, cert=cert_path, key=key_path)
res = the_app.query(
yql="select documentid, id, title from sources * where userQuery()",
query="Can you eat lobster?",
ranking="bm25",
)
res.hits[0]
A common problem is a cert mismatch - the cert/key pair used when deployed is different than the pair used when making requests against Vespa. This will cause 40x errors.
Make sure it is the same pair / re-create with vespa auth cert -f
AND redeploy.
If you re-generate a mTLS certificate pair, and use that when connecting to Vespa cloud endpoint, it will fail until you have updaded the deployment with the new public certificate.
Delete application
The following will delete the application and data from the dev environment.
[1]:
vespa_cloud.delete()
Example: Deploy the app to the prod environment
In addition to dev deployments, Pyvespa also supports deployments to the prod environment, though the process is slightly different.
First, we need to define a DeploymentConfiguration, which is used to generate a deployment.xml file. We’ll need to define at least one region. See the documentation for Vespa Zones for an exhaustive list of regions.
[1]:
from vespa.package import DeploymentConfiguration
deploy_config = DeploymentConfiguration(environment="prod", regions=["aws-us-east-1c"])
The DeploymentConfiguration can then be added to our application package.
The package must also fulfill certain resource requirements, as described here. Each cluster must have at least two nodes, and the content cluster requires a minimum redundancy of 2.
These options can be defined using the ContentCluster and ContainerCluster classes.
[1]:
from vespa.package import ContentCluster, ContainerCluster, Nodes
package = ApplicationPackage(
name=application,
deployment_config=deploy_config,
clusters=[
ContentCluster(
id="hybridsearch_content",
nodes=Nodes(count="2"),
document_name="hybridsearch",
min_redundancy="2",
),
ContainerCluster(
id="hybridsearch_container",
nodes=Nodes(count="2"),
),
],
)
Then, we can deploy the application to the prod environment by using VespaCloud.deploy_to_prod()
[1]:
app = VespaCloud(
tenant=os.environ["TENANT_NAME"],
application=application,
key_content=None,
key_location=api_key_path,
application_package=package,
)
app.deploy_to_prod()