Querying Vespa
This guide goes through how to query a Vespa instance using the Query API and https://cord19.vespa.ai/ app as an example.
Refer to troubleshooting for any problem when running this guide.
You can run this tutorial in Google Colab:
[ ]:
!pip3 install pyvespa
Connect to a running Vespa instance.
[7]:
from vespa.application import Vespa
from vespa.io import VespaQueryResponse
from vespa.exceptions import VespaError
app = Vespa(url="https://api.cord19.vespa.ai")
See the Vespa query language for Vespa query api request parameters.
The YQL userQuery() operator uses the query read from query
. The query also specificies to use the app specific bm25 rank profile. The code uses context manager with session
statement to make sure that connection pools are released. If you attempt to make multiple queries, this is important as each query
will not have to setup new connections.
[8]:
with app.syncio() as session:
response: VespaQueryResponse = session.query(
yql="select documentid, cord_uid, title, abstract from sources * where userQuery()",
hits=1,
query="Is remdesivir an effective treatment for COVID-19?",
ranking="bm25",
)
print(response.is_successful())
print(response.url)
True
https://api.cord19.vespa.ai/search/?yql=select+documentid%2C+cord_uid%2C+title%2C+abstract+from+sources+%2A+where+userQuery%28%29&hits=1&query=Is+remdesivir+an+effective+treatment+for+COVID-19%3F&ranking=bm25
Alternatively, if the native Vespa query parameter contains “.”, which cannot be used as a kwarg
, the parameters can be sent as HTTP POST with the body
argument. In this case ranking
is an alias of ranking.profile
, but using ranking.profile
as a **kwargs
argument is not allowed in python. This will combine HTTP parameters with a HTTP POST body.
[9]:
with app.syncio() as session:
response: VespaQueryResponse = session.query(
hits=1,
body={
"yql": "select documentid, cord_uid, title, abstract from sources * where userQuery()",
"query": "Is remdesivir an effective treatment for COVID-19?",
"ranking.profile": "bm25",
"presentation.timing": True,
},
)
print(response.is_successful())
True
The query specified that we wanted one hit:
[10]:
response.hits
[10]:
[{'id': 'id:covid-19:doc::534720',
'relevance': 26.6769101612402,
'source': 'content',
'fields': {'title': 'A Review on <hi>Remdesivir</hi>: A Possible Promising Agent for the <hi>Treatment</hi> of <hi>COVID</hi>-<hi>19</hi>',
'abstract': '<sep />manufacturing of specific therapeutics and vaccines to treat <hi>COVID</hi>-<hi>19</hi> are time-consuming processes. At this time, using available conventional therapeutics along with other <hi>treatment</hi> options may be useful to fight <hi>COVID</hi>-<hi>19</hi>. In different clinical trials, efficacy of <hi>remdesivir</hi> (GS-5734) against Ebola virus has been demonstrated. Moreover, <hi>remdesivir</hi> may be an <hi>effective</hi> therapy in vitro and in animal models infected by SARS and MERS coronaviruses. Hence, the drug may be theoretically <hi>effective</hi> against SARS-CoV-2. <hi>Remdesivir</hi><sep />',
'documentid': 'id:covid-19:doc::534720',
'cord_uid': 'xej338lo'}}]
Example of iterating over the returned hits obtained from respone.hits
, extracting the cord_uid
field:
[11]:
[hit["fields"]["cord_uid"] for hit in response.hits]
[11]:
['xej338lo']
Access the full JSON response in the Vespa default JSON result format:
[12]:
response.json
[12]:
{'timing': {'querytime': 0.005, 'summaryfetchtime': 0.0, 'searchtime': 0.007},
'root': {'id': 'toplevel',
'relevance': 1.0,
'fields': {'totalCount': 2390},
'coverage': {'coverage': 100,
'documents': 976355,
'full': True,
'nodes': 2,
'results': 1,
'resultsFull': 1},
'children': [{'id': 'id:covid-19:doc::534720',
'relevance': 26.6769101612402,
'source': 'content',
'fields': {'title': 'A Review on <hi>Remdesivir</hi>: A Possible Promising Agent for the <hi>Treatment</hi> of <hi>COVID</hi>-<hi>19</hi>',
'abstract': '<sep />manufacturing of specific therapeutics and vaccines to treat <hi>COVID</hi>-<hi>19</hi> are time-consuming processes. At this time, using available conventional therapeutics along with other <hi>treatment</hi> options may be useful to fight <hi>COVID</hi>-<hi>19</hi>. In different clinical trials, efficacy of <hi>remdesivir</hi> (GS-5734) against Ebola virus has been demonstrated. Moreover, <hi>remdesivir</hi> may be an <hi>effective</hi> therapy in vitro and in animal models infected by SARS and MERS coronaviruses. Hence, the drug may be theoretically <hi>effective</hi> against SARS-CoV-2. <hi>Remdesivir</hi><sep />',
'documentid': 'id:covid-19:doc::534720',
'cord_uid': 'xej338lo'}}]}}
Query Performance
There are several things that impact end-to-end query performance
HTTP layer performance, connecting handling, mututal TLS handshake and network round-trip latency
Make sure to re-use connections using context manager
with vespa.app.syncio():
to avoid setting up new connections for every unique query. See http best practisesThe size of the fields and the number of hits requested also greatly impacts network performance, a larger payload means higher latency.
By adding
"presentation.timing": True
as a request parameter, the Vespa response includes the server side processing (also including reading the query from network, but not delivering the result over the network). This can be handy to debug latency.
Vespa performance, the features used inside the Vespa instance.
[13]:
with app.syncio(connections=12) as session:
response: VespaQueryResponse = session.query(
hits=1,
body={
"yql": "select documentid, cord_uid, title, abstract from sources * where userQuery()",
"query": "Is remdesivir an effective treatment for COVID-19?",
"ranking.profile": "bm25",
"presentation.timing": True,
},
)
print(response.is_successful())
True
Compressing queries
The VespaSync
class has a compress
argument that can be used to compress the query before sending it to Vespa. This can be useful when the query is large and/or the network is slow. The compression is done using gzip
, and is supported by Vespa.
By default, the compress
argument is set to "auto"
, which means that the query will be compressed if the size of the query is larger than 1024 bytes. The compress
argument can also be set to True
or False
to force the query to be compressed or not, respectively.
The compression will be applied to both queries and feed operations. (HTTP POST or PUT requests).
[45]:
import time
# Will not compress the request, as body is less than 1024 bytes
with app.syncio(connections=1, compress="auto") as session:
response: VespaQueryResponse = session.query(
hits=1,
body={
"yql": "select documentid, cord_uid, title, abstract from sources * where userQuery()",
"query": "Is remdesivir an effective treatment for COVID-19?",
"ranking.profile": "bm25",
"presentation.timing": True,
},
)
print(response.is_successful())
# Will compress, as the size of the body exceeds 1024 bytes.
large_body = {
"yql": "select documentid, cord_uid, title, abstract from sources * where userQuery()",
"query": "Is remdesivir an effective treatment for COVID-19?",
"input.query(q)": "asdf" * 10000,
"ranking.profile": "bm25",
"presentation.timing": True,
}
compress_time = {}
with app.syncio(connections=1, compress=True) as session:
start_time = time.time()
response: VespaQueryResponse = session.query(
hits=1,
body=large_body,
)
end_time = time.time()
compress_time["force_compression"] = end_time - start_time
print(response.is_successful())
with app.syncio(connections=1, compress="auto") as session:
start_time = time.time()
response: VespaQueryResponse = session.query(
hits=1,
body=large_body,
)
end_time = time.time()
compress_time["auto"] = end_time - start_time
print(response.is_successful())
# Force no compression
with app.syncio(compress=False) as session:
start_time = time.time()
response: VespaQueryResponse = session.query(
hits=1,
body=large_body,
timeout="5s",
)
end_time = time.time()
compress_time["no_compression"] = end_time - start_time
print(response.is_successful())
True
True
True
True
[47]:
compress_time
[47]:
{'force_compression': 0.5579209327697754,
'auto': 0.7328271865844727,
'no_compression': 0.45219922065734863}
The differences will be more significant the larger the size of the body, and the slower the network. It might be beneficial to perform a proper benchmarking if performance is critical for your application.
Running Queries asynchonously
If you want benchmark the capacity of a Vespa application, we suggest using vespa-fbench that is a load generator tool which lets you measure throughput and latency with a predefined number of clients. Vespa-fbench is not Vespa-specific, and can be used to benchmark any HTTP service.
Another option is to use the Open Source k6 load testing tool.
If you want to run multiple queries from pyvespa, we suggest using the async client. Below, we will demonstrate a simple example of running 100 queries in parallel using the async client, and capture both the server-reported times and the client-reported times (including network latency).
[13]:
# This cell is necessary when running async code in Jupyter Notebooks, as it already runs an event loop
import nest_asyncio
nest_asyncio.apply()
[48]:
import asyncio
import time
# Define a single query function that takes a session
async def run_query_async(session, body):
start_time = time.time()
response = await session.query(body=body)
end_time = time.time()
return response, end_time - start_time
query = {
"yql": "select documentid, cord_uid, title, abstract from sources * where userQuery()",
"query": "Is remdesivir an effective treatment for COVID-19?",
"ranking.profile": "bm25",
"presentation.timing": True,
}
# List of queries with hits from 1 to 100
queries = [{**query, "hits": hits} for hits in range(1, 101)]
# Define a function to run multiple queries concurrently using the same session
async def run_multiple_queries(queries):
# Async client uses HTTP/2, so we only need one connection
async with app.asyncio(connections=1) as session: # Reuse the same session
tasks = []
for q in queries:
tasks.append(run_query_async(session, q))
responses = await asyncio.gather(*tasks)
return responses
# Run the queries concurrently
start_time = time.time()
responses = asyncio.run(run_multiple_queries(queries))
end_time = time.time()
print(f"Total time: {end_time - start_time:.2f} seconds")
# Print QPS
print(f"QPS: {len(queries) / (end_time - start_time):.2f}")
Total time: 1.73 seconds
QPS: 57.77
[46]:
dict_responses = [response.json | {"time": timing} for response, timing in responses]
[49]:
dict_responses[0]
[49]:
{'timing': {'querytime': 0.003, 'summaryfetchtime': 0.0, 'searchtime': 0.004},
'root': {'id': 'toplevel',
'relevance': 1.0,
'fields': {'totalCount': 2444},
'coverage': {'coverage': 100,
'documents': 976355,
'full': True,
'nodes': 2,
'results': 1,
'resultsFull': 1},
'children': [{'id': 'id:covid-19:doc::779001',
'relevance': 27.517448178754492,
'source': 'content',
'fields': {'title': 'Cost utility analysis of <hi>Remdesivir</hi> and Dexamethasone <hi>treatment</hi> for hospitalised <hi>COVID</hi>-<hi>19</hi> patients - a hypothetical study',
'abstract': '<sep />: Sars-Cov-2 is a novel corona virus associated with significant morbidity and mortality. <hi>Remdesivir</hi> and Dexamethasone are two <hi>treatments</hi> that have shown to be <hi>effective</hi> against the Sars-Cov-2 associated disease. However, a cost-effectiveness analysis of the two <hi>treatments</hi> is still lacking. OBJECTIVE: The cost-utility of <hi>Remdesivir</hi>, Dexamethasone and a simultaneous use of the two drugs with respect to standard of care for <hi>treatment</hi> <hi>Covid</hi>-<hi>19</hi> hospitalized patients is evaluated, together with the effect<sep />',
'documentid': 'id:covid-19:doc::779001',
'cord_uid': 'ysml5abq'}}]},
'time': 1.4157278537750244}
[47]:
# Create a pandas DataFrame with the responses
import pandas as pd
df = pd.DataFrame(
[
{
"hits": len(response["root"]["children"]),
"search_time": response["timing"]["searchtime"],
"query_time": response["timing"]["querytime"],
"summary_time": response["timing"]["summaryfetchtime"],
"total_time": response["time"],
}
for response in dict_responses
]
)
df
[47]:
hits | search_time | query_time | summary_time | total_time | |
---|---|---|---|---|---|
0 | 1 | 0.004 | 0.003 | 0.000 | 1.415728 |
1 | 2 | 0.005 | 0.004 | 0.000 | 1.067308 |
2 | 3 | 0.009 | 0.007 | 0.001 | 1.415624 |
3 | 4 | 0.011 | 0.010 | 0.000 | 1.069153 |
4 | 5 | 0.010 | 0.008 | 0.001 | 1.505080 |
... | ... | ... | ... | ... | ... |
95 | 96 | 0.033 | 0.012 | 0.020 | 1.659568 |
96 | 97 | 0.043 | 0.020 | 0.021 | 1.599375 |
97 | 98 | 0.017 | 0.005 | 0.011 | 1.621481 |
98 | 99 | 0.023 | 0.011 | 0.011 | 1.615766 |
99 | 100 | 0.050 | 0.025 | 0.025 | 1.602700 |
100 rows × 5 columns
Error handling
Vespa’s default query timeout is 500ms, PyVespa will by default retry up to 3 times for queries that return response codes like 429, 500,503 and 504. A VespaError
is raised if retries did not end up with success. In the following example we set a very low timeout of 1ms
which will cause Vespa to time out the request and it returns a 504 http error code. The underlaying error is wrapped in a VespaError
with the
payload error message returned from Vespa:
[ ]:
with app.syncio(connections=12) as session:
try:
response: VespaQueryResponse = session.query(
hits=1,
body={
"yql": "select * from sources * where userQuery()",
"query": "Is remdesivir an effective treatment for COVID-19?",
"timeout": "1ms",
},
)
print(response.is_successful())
except VespaError as e:
print(str(e))
In the following example we forgot to include the query
parameter, but still reference it in the yql, this cause a bad client request response (400):
[ ]:
with app.syncio(connections=12) as session:
try:
response: VespaQueryResponse = session.query(
hits=1, body={"yql": "select * from sources * where userQuery()"}
)
print(response.is_successful())
except VespaError as e:
print(str(e))