Multi-vector indexing with HNSW
This is the pyvespa steps of the multi-vector-indexing sample application. Go to the source for a full description and prerequisites, and read the blog post. Highlighted features:
- Approximate Nearest Neighbor Search - using HNSW or exact
- Use a Component to configure the Huggingface embedder.
- Using synthetic fields with auto-generated embeddings in data and query flow.
- Application package file export, model files in the application package, deployment from files.
- Multiphased ranking.
- How to control text search result highlighting.
For simpler examples, see text search and pyvespa examples.
Pyvespa is an add-on to Vespa, and this guide will export the application package containing services.xml
and wiki.sd
. The latter is the schema file for this application - knowing services.xml and schema files is useful when reading Vespa documentation.
This notebook requires pyvespa >= 0.37.1, ZSTD, and the Vespa CLI.
[ ]:
!pip3 install pyvespa
Create the application
Configure the Vespa instance with a component loading the E5-small model. Components are used to plug in code and models to a Vespa application - read more:
[ ]:
from vespa.package import *
from pathlib import Path
app_package = ApplicationPackage(name="wiki",
components=[Component(id="e5-small-q", type="hugging-face-embedder",
parameters=[
Parameter("transformer-model", {"path": "model/e5-small-v2-int8.onnx"}),
Parameter("tokenizer-model", {"path": "model/tokenizer.json"})
])])
Configure fields
Vespa has a variety of basic and complex field types. This application uses a combination of integer, text and tensor fields, making it easy to implement hybrid ranking use cases:
[ ]:
app_package.schema.add_fields(
Field(name="id", type="int", indexing=["attribute", "summary"]),
Field(name="title", type="string", indexing=["index", "summary"], index="enable-bm25"),
Field(name="url", type="string", indexing=["index", "summary"], index="enable-bm25"),
Field(name="paragraphs", type="array<string>", indexing=["index", "summary"],
index="enable-bm25", bolding=True),
Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
indexing=["input paragraphs", "embed", "index", "attribute"],
ann=HNSW(distance_metric="angular"),
is_document_field=False)
#
# Alteratively, for exact distance calculation not using HNSW:
#
# Field(name="paragraph_embeddings", type="tensor<float>(p{},x[384])",
# indexing=["input paragraphs", "embed", "attribute"],
# attribute=["distance-metric: angular"],
# is_document_field=False)
)
One field of particular interest is paragraph_embeddings
. Note that we are not feeding embeddings to this instance. Instead, the embeddings are generated by using the embed feature, using the model configured at start. Read more in Text embedding made simple.
Looking closely at the code, paragraph_embeddings
uses is_document_field=False
, meaning it will read another field as input (here paragraph
), and run embed
on it.
As only one model is configured, embed
will use that one - it is possible to configure mode models and use embed model-id
as well.
As the code comment illustrates, there can be different distrance metrics used, as well as using an exact or approximate nearest neighbor search.
Configure rank profiles
A rank profile defines the computation for the ranking, with a wide range of possible features as input. Below you will find first_phase
ranking using text ranking (bm
), semantic ranking using vector distance (consider a tensor a vector here), and combinations of the two:
[ ]:
app_package.schema.add_rank_profile(RankProfile(
name="semantic",
inputs=[("query(q)", "tensor<float>(x[384])")],
inherits="default",
first_phase="cos(distance(field,paragraph_embeddings))",
match_features=["closest(paragraph_embeddings)"])
)
app_package.schema.add_rank_profile(RankProfile(
name = "bm25",
first_phase = "2*bm25(title) + bm25(paragraphs)")
)
app_package.schema.add_rank_profile(RankProfile(
name="hybrid",
inherits="semantic",
functions=[
Function(name="avg_paragraph_similarity",
expression="""reduce(
sum(l2_normalize(query(q),x) * l2_normalize(attribute(paragraph_embeddings),x),x),
avg,
p
)"""),
Function(name="max_paragraph_similarity",
expression="""reduce(
sum(l2_normalize(query(q),x) * l2_normalize(attribute(paragraph_embeddings),x),x),
max,
p
)"""),
Function(name="all_paragraph_similarities",
expression="sum(l2_normalize(query(q),x) * l2_normalize(attribute(paragraph_embeddings),x),x)")
],
first_phase=FirstPhaseRanking(
expression="cos(distance(field,paragraph_embeddings))"),
second_phase=SecondPhaseRanking(
expression="firstPhase + avg_paragraph_similarity() + log( bm25(title) + bm25(paragraphs) + bm25(url))"),
match_features=["closest(paragraph_embeddings)",
"firstPhase",
"bm25(title)",
"bm25(paragraphs)",
"avg_paragraph_similarity",
"max_paragraph_similarity",
"all_paragraph_similarities"])
)
Configure fieldset
A fieldset is a way to configure search in multiple fields:
[ ]:
app_package.schema.add_field_set(FieldSet(name="default", fields=["title", "url", "paragraphs"]))
Configure document summary
A document summary is the collection of fields to return in query results - the default summary is used unless other specified in the query. Here we configure a minimal
fieldset without the larger paragraph text/embedding fields:
[ ]:
app_package.schema.add_document_summary(DocumentSummary(name="minimal",
summary_fields=[Summary("id", "int"),
Summary("title", "string")]))
Export the configuration
At this point, the application is well defined. Remember that the Component configuration at start configures model files to be found in a model
directory. We must therefore export the configuration and add the models, before we can deploy to the Vespa instance. Export the application package:
[ ]:
Path("pkg").mkdir(parents=True, exist_ok=True)
app_package.to_files("pkg")
It is a good idea to inspect the files exported into pkg
- these are files referred to in the Vespa Documentation.
Download model files
At this point, we can save the model files into the application package:
[ ]:
! mkdir -p pkg/model
! curl -L -o pkg/model/tokenizer.json \
https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json
! curl -L -o pkg/model/e5-small-v2-int8.onnx \
https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx
Deploy the application
As all the files in the app package are ready, we can start a Vespa instance - here using Docker. Deploy the app package:
[ ]:
from vespa.deployment import VespaDocker
vespa_docker = VespaDocker()
app = vespa_docker.deploy_from_disk(application_name="wiki", application_root="pkg")
Feed documents
Download the Wikipedia articles:
[ ]:
! curl -s -H "Accept:application/vnd.github.v3.raw" \
https://api.github.com/repos/vespa-engine/sample-apps/contents/multi-vector-indexing/ext/articles.jsonl.zst | \
zstdcat - > articles.jsonl
I you do not have ZSTD install, get articles.jsonl.zip
and unzip it instead.
Feed and index the Wikipedia articles using the Vespa CLI. As part of feeding, embed
is called on each article, and the output of this is stored in the paragraph_embeddings
field:
[ ]:
! vespa config set target local
! vespa feed articles.jsonl
Note that creating embeddings is computationally expensive, but this is a small dataset with only 8 articles, so will be done in a few seconds.
The Vespa instance is now populated with the Wikipedia articles, with generated embeddings, and ready for queries. The next sections have examples of various kinds of queries to run on the dataset.
Simple retrieve all articles with undefined ranking
Run a query selecting all documents, returning two of them. The rank profile is the built-in unranked
which means no ranking calculations are done, the results are returned in random order:
[ ]:
result = app.query(body={
'yql': 'select * from wiki where true',
'ranking.profile': 'unranked',
'hits': 2
})
result.hits
Traditional keyword search with BM25 ranking on the article level
Run a text-search query and use the bm25 ranking profile configured at the start of this guide: 2*bm25(title) + bm25(paragraphs)
. Here, we use BM25 on the title
and paragraph
text fields, giving more weight to matches in title:
[ ]:
result = app.query(body={
'yql': 'select * from wiki where userQuery()',
'query': 24,
'ranking.profile': 'bm25',
'hits': 2
})
result.hits
Semantic vector search on the paragraph level
This query creates an embedding of the query “what does 24 mean in the context of railways” and specifies the semantic
ranking profile: cos(distance(field,paragraph_embeddings))
. This will hence compute the distance between the vector in the query and the vectors computed when indexing: "input paragraphs", "embed", "index", "attribute"
:
[ ]:
result = app.query(body={
'yql': 'select * from wiki where {targetHits:1}nearestNeighbor(paragraph_embeddings,q)',
'input.query(q)': 'embed(what does 24 mean in the context of railways)',
'ranking.profile': 'semantic',
'hits': 2
})
result.hits
An interesting question then is, of the paragraphs in the document, which one was the closest? When analysing ranking, using match-features lets you export the scores used in the ranking calculations, see closest - from the result above:
'matchfeatures': {
'closest(paragraph_embeddings)': {
'type': 'tensor<float>(p{})',
'cells': {'4': 1.0}
}
}
This means, the tensor of index 4 has the closest match. With this, it is straight forward to feed articles with an array of paragraphs and highlight the best matching paragraph in the document!
Hybrid search and ranking
Hybrid combining keyword search on the article level with vector search in the paragraph index:
[ ]:
result = app.query(body={
'yql': 'select * from wiki where userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))',
'input.query(q)': 'embed(what does 24 mean in the context of railways)',
'query': 'what does 24 mean in the context of railways',
'ranking.profile': 'hybrid',
'hits': 1
})
result.hits
This case combines exact search with nearestNeighbor search. The hybrid
rank-profile above also calculates several additional features using tensor expressions:
firstPhase
is the score of the first ranking phase, configured in the hybrid profile ascos(distance(field, paragraph_embeddings))
.all_paragraph_similarities
returns all the similarity scores for all paragraphs.avg_paragraph_similarity
is the average similarity score across all the paragraphs.max_paragraph_similarity
is the same asfirstPhase
, but computed using a tensor expression.
These additional features are calculated during second-phase ranking to limit the number of vector computations.
The Tensor Playground is useful to play with tensor expressions.
The Hybrid Search blog post series is a good read to learn more about hybrid ranking!
Hybrid search and filter
YQL is a structured query langauge. In the query examples, the user input is fed as-is using the userQuery()
operator.
Filters are normally separate from the user input, below is an example of adding a filter url contains "9985"
to the YQL string.
Finally, the use the Query API for other options, like highlighting - here disable bolding:
[ ]:
result = app.query(body={
'yql': 'select * from wiki where url contains "9985" and userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))',
'input.query(q)': 'embed(what does 24 mean in the context of railways)',
'query': 'what does 24 mean in the context of railways',
'ranking.profile': 'hybrid',
'bolding': False
})
result.hits
In short, the above query demonstrates how easy it is to combine various ranking strategies, and also combine with filters.
To learn more about pre-filtering vs post-filtering, read Filtering strategies and serving performance. Semantic search with multi-vector indexing is a great read overall for this domain.
Cleanup
[34]:
vespa_docker.container.stop()
vespa_docker.container.remove()