Reference API

Define application

Create an Application Package

The first step to create a Vespa application is to create an instance of ApplicationPackage.

ApplicationPackage

class vespa.package.ApplicationPackage(name: str, schema: Optional[List[vespa.package.Schema]] = None, query_profile: Optional[vespa.package.QueryProfile] = None, query_profile_type: Optional[vespa.package.QueryProfileType] = None)
__init__(name: str, schema: Optional[List[vespa.package.Schema]] = None, query_profile: Optional[vespa.package.QueryProfile] = None, query_profile_type: Optional[vespa.package.QueryProfileType] = None)None

Create a Vespa Application Package.

Check the Vespa documentation for more detailed information about application packages.

Parameters
  • name – Application name.

  • schema – List of Schema`s of the application. If `None, an empty Schema with the same name of the application will be created by default.

  • query_profileQueryProfile of the application. If None, a QueryProfile named default with QueryProfileType named root will be created by default.

  • query_profile_typeQueryProfileType of the application. If None, a empty QueryProfileType named root will be created by default.

The easiest way to get started is to create a default application package:

>>> ApplicationPackage(name="test_app")
ApplicationPackage('test_app', [Schema('test_app', Document(None, None), None, None, [], False, None)], QueryProfile(None), QueryProfileType(None))

It will create a default Schema, QueryProfile and QueryProfileType that you can then populate with specifics of your application.

add_model_ranking(model_config: vespa.ml.ModelConfig, schema=None, include_model_summary_features=False, **kwargs)None

Add ranking profile based on a specific model config.

Parameters
  • model_config – Model config instance specifying the model to be used on the RankProfile.

  • schema – Name of the schema to add model ranking to.

  • include_model_summary_features – True to include model specific summary features, such as inputs and outputs that are useful for debugging. Default to False as this requires an extra model evaluation when fetching summary features.

  • kwargs – Further arguments to be passed to RankProfile.

Returns

None

add_schema(*schemas: vespa.package.Schema)None

Add Schema’s to the application package.

Parameters

schemas – schemas to be added

Returns

Schema and Document

An ApplicationPackage instance comes with a default Schema that contains a default Document, meaning that you usually do not need to create those yourself.

Schema

class vespa.package.Schema(name: str, document: vespa.package.Document, fieldsets: Optional[List[vespa.package.FieldSet]] = None, rank_profiles: Optional[List[vespa.package.RankProfile]] = None, models: Optional[List[vespa.package.OnnxModel]] = None, global_document: bool = False, imported_fields: Optional[List[vespa.package.ImportedField]] = None)
__init__(name: str, document: vespa.package.Document, fieldsets: Optional[List[vespa.package.FieldSet]] = None, rank_profiles: Optional[List[vespa.package.RankProfile]] = None, models: Optional[List[vespa.package.OnnxModel]] = None, global_document: bool = False, imported_fields: Optional[List[vespa.package.ImportedField]] = None)None

Create a Vespa Schema.

Check the Vespa documentation <https://docs.vespa.ai/documentation/schemas.html>`__ for more detailed information about schemas.

Parameters
  • name – Schema name.

  • document – Vespa Document associated with the Schema.

  • fieldsets – A list of FieldSet associated with the Schema.

  • rank_profiles – A list of RankProfile associated with the Schema.

  • models – A list of OnnxModel associated with the Schema.

  • global_document – Set to True to copy the documents to all content nodes. Default to False.

  • imported_fields – A list of ImportedField defining fields from global documents to be imported.

To create a Schema:

>>> Schema(name="schema_name", document=Document())
Schema('schema_name', Document(None, None), None, None, [], False, None)
add_field_set(field_set: vespa.package.FieldSet)None

Add a FieldSet to the Schema.

Parameters

field_set – field sets to be added.

add_fields(*fields: vespa.package.Field)None

Add Field to the Schema’s Document.

Parameters

fields – fields to be added.

add_imported_field(imported_field: vespa.package.ImportedField)None

Add a ImportedField to the Schema.

Parameters

imported_field – imported field to be added.

add_model(model: vespa.package.OnnxModel)None

Add a OnnxModel to the Schema. :param model: model to be added. :return: None.

add_rank_profile(rank_profile: vespa.package.RankProfile)None

Add a RankProfile to the Schema.

Parameters

rank_profile – rank profile to be added.

Returns

None.

Document

class vespa.package.Document(fields: Optional[List[vespa.package.Field]] = None, inherits: Optional[str] = None)
__init__(fields: Optional[List[vespa.package.Field]] = None, inherits: Optional[str] = None)None

Create a Vespa Document.

Check the Vespa documentation <https://docs.vespa.ai/documentation/documents.html>`__ for more detailed information about documents.

Parameters

fields – A list of Field to include in the document’s schema.

To create a Document:

>>> Document()
Document(None, None)
>>> Document(fields=[Field(name="title", type="string")])
Document([Field('title', 'string', None, None, None, None)], None)
>>> Document(fields=[Field(name="title", type="string")], inherits="context")
Document([Field('title', 'string', None, None, None, None)], context)
add_fields(*fields: vespa.package.Field)None

Add Field’s to the document.

Parameters

fields – fields to be added

Returns

Field

Once we have an ApplicationPackage instance containing a Schema and a Document we usually want to add fields so that we can store our data in a structured manner. We can accomplish that by creating Field instances and adding those to the ApplicationPackage instance via Schema and Document methods.

class vespa.package.Field(name: str, type: str, indexing: Optional[List[str]] = None, index: Optional[str] = None, attribute: Optional[List[str]] = None, ann: Optional[vespa.package.HNSW] = None)
__init__(name: str, type: str, indexing: Optional[List[str]] = None, index: Optional[str] = None, attribute: Optional[List[str]] = None, ann: Optional[vespa.package.HNSW] = None)None

Create a Vespa field.

Check the Vespa documentation <https://docs.vespa.ai/documentation/reference/schema-reference.html#field>`__ for more detailed information about fields.

Parameters
  • name – Field name.

  • type – Field data type.

  • indexing – Configures how to process data of a field during indexing.

  • index – Sets index parameters. Content in fields with index are normalized and tokenized by default.

  • attribute – Specifies a property of an index structure attribute.

  • ann – Add configuration for approximate nearest neighbor.

>>> Field(name = "title", type = "string", indexing = ["index", "summary"], index = "enable-bm25")
Field('title', 'string', ['index', 'summary'], 'enable-bm25', None, None)
>>> Field(
...     name = "abstract",
...     type = "string",
...     indexing = ["attribute"],
...     attribute=["fast-search", "fast-access"]
... )
Field('abstract', 'string', ['attribute'], None, ['fast-search', 'fast-access'], None)
>>> Field(name="tensor_field",
...     type="tensor<float>(x[128])",
...     indexing=["attribute"],
...     ann=HNSW(
...         distance_metric="euclidean",
...         max_links_per_node=16,
...         neighbors_to_explore_at_insert=200,
...     ),
... )
Field('tensor_field', 'tensor<float>(x[128])', ['attribute'], None, None, HNSW('euclidean', 16, 200))

FieldSet

class vespa.package.FieldSet(name: str, fields: List[str])
__init__(name: str, fields: List[str])None

Create a Vespa field set.

A fieldset groups fields together for searching. Check the Vespa documentation <https://docs.vespa.ai/documentation/reference/schema-reference.html#fieldset>`__ for more detailed information about field sets.

Parameters
  • name – Name of the fieldset

  • fields – Field names to be included in the fieldset.

>>> FieldSet(name="default", fields=["title", "body"])
FieldSet('default', ['title', 'body'])

RankProfile

class vespa.package.RankProfile(name: str, first_phase: str, inherits: Optional[str] = None, constants: Optional[Dict] = None, functions: Optional[List[vespa.package.Function]] = None, summary_features: Optional[List] = None, second_phase: Optional[vespa.package.SecondPhaseRanking] = None)
__init__(name: str, first_phase: str, inherits: Optional[str] = None, constants: Optional[Dict] = None, functions: Optional[List[vespa.package.Function]] = None, summary_features: Optional[List] = None, second_phase: Optional[vespa.package.SecondPhaseRanking] = None)None

Create a Vespa rank profile.

Rank profiles are used to specify an alternative ranking of the same data for different purposes, and to experiment with new rank settings. Check the Vespa documentation <https://docs.vespa.ai/documentation/reference/schema-reference.html#rank-profile>`__ for more detailed information about rank profiles.

Parameters
  • name – Rank profile name.

  • first_phase – The config specifying the first phase of ranking. More info <https://docs.vespa.ai/documentation/reference/schema-reference.html#firstphase-rank>`__ about first phase ranking.

  • inherits – The inherits attribute is optional. If defined, it contains the name of one other rank profile in the same schema. Values not defined in this rank profile will then be inherited.

  • constants – Dict of constants available in ranking expressions, resolved and optimized at configuration time. More info <https://docs.vespa.ai/documentation/reference/schema-reference.html#constants>`__ about constants.

  • functions – Optional list of Function representing rank functions to be included in the rank profile.

  • summary_features – List of rank features to be included with each hit. More info <https://docs.vespa.ai/documentation/reference/schema-reference.html#summary-features>`__ about summary features.

  • second_phase – Optional config specifying the second phase of ranking. See SecondPhaseRanking.

>>> RankProfile(name = "default", first_phase = "nativeRank(title, body)")
RankProfile('default', 'nativeRank(title, body)', None, None, None, None, None)
>>> RankProfile(name = "new", first_phase = "BM25(title)", inherits = "default")
RankProfile('new', 'BM25(title)', 'default', None, None, None, None)
>>> RankProfile(
...     name = "new",
...     first_phase = "BM25(title)",
...     inherits = "default",
...     constants={"TOKEN_NONE": 0, "TOKEN_CLS": 101, "TOKEN_SEP": 102},
...     summary_features=["BM25(title)"]
... )
RankProfile('new', 'BM25(title)', 'default', {'TOKEN_NONE': 0, 'TOKEN_CLS': 101, 'TOKEN_SEP': 102}, None, ['BM25(title)'], None)
>>> RankProfile(
...     name="bert",
...     first_phase="bm25(title) + bm25(body)",
...     second_phase=SecondPhaseRanking(expression="1.25 * bm25(title) + 3.75 * bm25(body)", rerank_count=10),
...     inherits="default",
...     constants={"TOKEN_NONE": 0, "TOKEN_CLS": 101, "TOKEN_SEP": 102},
...     functions=[
...         Function(
...             name="question_length",
...             expression="sum(map(query(query_token_ids), f(a)(a > 0)))"
...         ),
...         Function(
...             name="doc_length",
...             expression="sum(map(attribute(doc_token_ids), f(a)(a > 0)))"
...         )
...     ],
...     summary_features=["question_length", "doc_length"]
... )
RankProfile('bert', 'bm25(title) + bm25(body)', 'default', {'TOKEN_NONE': 0, 'TOKEN_CLS': 101, 'TOKEN_SEP': 102}, [Function('question_length', 'sum(map(query(query_token_ids), f(a)(a > 0)))', None), Function('doc_length', 'sum(map(attribute(doc_token_ids), f(a)(a > 0)))', None)], ['question_length', 'doc_length'], SecondPhaseRanking('1.25 * bm25(title) + 3.75 * bm25(body)', 10))

Query Profile

A QueryProfile is a named collection of search request parameters given in the configuration. The search request can specify a query profile whose parameters will be used as parameters of that request. The query profiles may optionally be type checked. Type checking is turned on by referencing a QueryProfileType from the query profile.

An ApplicationPackage instance comes with a default QueryProfile named default that is associated with a QueryProfileType named root, meaning that you usually do not need to create those yourself, only add fields to them when required.

Create a QueryProfileType

QueryTypeField
class vespa.package.QueryTypeField(name: str, type: str)
__init__(name: str, type: str)None

Create a field to be included in a QueryProfileType.

Parameters
  • name – Field name.

  • type – Field type.

>>> QueryTypeField(
...     name="ranking.features.query(title_bert)",
...     type="tensor<float>(x[768])"
... )
QueryTypeField('ranking.features.query(title_bert)', 'tensor<float>(x[768])')
QueryProfileType
class vespa.package.QueryProfileType(fields: Optional[List[vespa.package.QueryTypeField]] = None)
__init__(fields: Optional[List[vespa.package.QueryTypeField]] = None)None

Create a Vespa Query Profile Type.

Check the Vespa documentation <https://docs.vespa.ai/documentation/query-profiles.html#query-profile-types>`__ for more detailed information about query profile types.

Parameters

fields – A list of QueryTypeField.

>>> QueryProfileType(
...     fields = [
...         QueryTypeField(
...             name="ranking.features.query(tensor_bert)",
...             type="tensor<float>(x[768])"
...         )
...     ]
... )
QueryProfileType([QueryTypeField('ranking.features.query(tensor_bert)', 'tensor<float>(x[768])')])
add_fields(*fields: vespa.package.QueryTypeField)None

Add QueryTypeField’s to the Query Profile Type.

Parameters

fields – fields to be added

>>> query_profile_type = QueryProfileType()
>>> query_profile_type.add_fields(
...     QueryTypeField(
...         name="age",
...         type="integer"
...     ),
...     QueryTypeField(
...         name="profession",
...         type="string"
...     )
... )

Create a QueryProfile

QueryField
class vespa.package.QueryField(name: str, value: Union[str, int, float])
__init__(name: str, value: Union[str, int, float])None

Create a field to be included in a QueryProfile.

Parameters
  • name – Field name.

  • value – Field value.

>>> QueryField(name="maxHits", value=1000)
QueryField('maxHits', 1000)
QueryProfile
class vespa.package.QueryProfile(fields: Optional[List[vespa.package.QueryField]] = None)
__init__(fields: Optional[List[vespa.package.QueryField]] = None)None

Create a Vespa Query Profile.

Check the Vespa documentation <https://docs.vespa.ai/documentation/query-profiles.html>`__ for more detailed information about query profiles.

Parameters

fields – A list of QueryField.

>>> QueryProfile(fields=[QueryField(name="maxHits", value=1000)])
QueryProfile([QueryField('maxHits', 1000)])
add_fields(*fields: vespa.package.QueryField)None

Add QueryField’s to the Query Profile.

Parameters

fields – fields to be added

>>> query_profile = QueryProfile()
>>> query_profile.add_fields(QueryField(name="maxHits", value=1000))

Deploy your application

VespaDocker

class vespa.package.VespaDocker(disk_folder: str, port: int = 8080, container_memory: Union[str, int] = 4294967296, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, container: Optional[docker.models.containers.Container] = None)
__init__(disk_folder: str, port: int = 8080, container_memory: Union[str, int] = 4294967296, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, container: Optional[docker.models.containers.Container] = None)None

Manage Docker deployments.

Parameters
  • disk_folder – Disk folder to save the required Vespa config files.

  • port – Container port.

  • output_file – Output file to write output messages.

  • container_memory – Docker container memory available to the application.

  • container – Used when instantiating VespaDocker from a running container.

deploy(application_package: vespa.package.ApplicationPackage)vespa.application.Vespa

Deploy the application package into a Vespa container. :param application_package: ApplicationPackage to be deployed. :return: a Vespa connection instance.

deploy_from_disk(application_name: str, application_folder: Optional[str] = None)vespa.application.Vespa

Deploy disk-based application package into a Vespa container.

Parameters
  • application_name – Name of the application.

  • application_folder – Relative path to the folder inside disk_folder containing the application files. If None, we assume disk_folder to be the application folder.

Returns

a Vespa connection instance.

export_application_package(application_package: vespa.package.ApplicationPackage)None

Export application package to disk. :param application_package: Application package to export. :return: None. Application package file will be stored on disk_folder.

static from_container_name_or_id(name_or_id: str, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)vespa.package.VespaDocker

Instantiate VespaDocker from a running container.

Parameters
  • name_or_id – Name or id of the container.

  • output_file – Output file to write output messages.

Returns

VespaDocker instance associated with the running container.

restart_services()

Restart Vespa services.

Returns

None

start_services()

Start Vespa services.

Returns

None

stop_services()

Stop Vespa services.

Returns

None

VespaCloud

class vespa.package.VespaCloud(tenant: str, application: str, application_package: vespa.package.ApplicationPackage, key_location: Optional[str] = None, key_content: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)
__init__(tenant: str, application: str, application_package: vespa.package.ApplicationPackage, key_location: Optional[str] = None, key_content: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)None

Deploy application to the Vespa Cloud (cloud.vespa.ai)

Parameters
  • tenant – Tenant name registered in the Vespa Cloud.

  • application – Application name registered in the Vespa Cloud.

  • application_package – ApplicationPackage to be deployed.

  • key_location – Location of the private key used for signing HTTP requests to the Vespa Cloud.

  • key_content – Content of the private key used for signing HTTP requests to the Vespa Cloud. Use only when key file is not available.

  • output_file – Output file to write output messages.

delete(instance: str)

Delete the specified instance from the dev environment in the Vespa Cloud. :param instance: Name of the instance to delete. :return:

deploy(instance: str, disk_folder: str)vespa.application.Vespa

Deploy the given application package as the given instance in the Vespa Cloud dev environment.

Parameters
  • instance – Name of this instance of the application, in the Vespa Cloud.

  • disk_folder – Disk folder to save the required Vespa config files.

Returns

a Vespa connection instance.

Connect to existing application

Vespa

class vespa.application.Vespa(url: str, port: Optional[int] = None, deployment_message: Optional[List[str]] = None, cert: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)
__init__(url: str, port: Optional[int] = None, deployment_message: Optional[List[str]] = None, cert: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)None

Establish a connection with a Vespa application.

Parameters
  • url – Vespa instance URL.

  • port – Vespa instance port.

  • deployment_message – Message returned by Vespa engine after deployment. Used internally by deploy methods.

  • cert – Path to certificate and key file.

  • output_file – Output file to write output messages.

>>> Vespa(url = "https://cord19.vespa.ai")  
>>> Vespa(url = "http://localhost", port = 8080)
Vespa(http://localhost, 8080)
>>> Vespa(url = "https://api.vespa-external.aws.oath.cloud", port = 4443, cert = "/path/to/cert-and-key.pem")  

Interact to existing application

Feed data

feed_batch

Vespa.feed_batch(schema: str, batch: List[Dict], asynchronous=False)

Feed a batch of data to a Vespa app.

Parameters
  • schema – The schema that we are sending data to.

  • batch – A list of dict containing the keys ‘id’ and ‘fields’ to be used in the feed_data_point().

  • asynchronous – Set True to send data in async mode. Default to False. Create and execute the coroutine if there is no active running loop. Otherwise it returns the coroutine and requires await to be executed.

Returns

List of HTTP POST responses

feed_data_point

Vespa.feed_data_point(schema: str, data_id: str, fields: Dict)vespa.io.VespaResponse

Feed a data point to a Vespa app.

Parameters
  • schema – The schema that we are sending data to.

  • data_id – Unique id associated with this data point.

  • fields – Dict containing all the fields required by the schema.

Returns

Response of the HTTP POST request.

Get, update and delete data

get_data

Vespa.get_data(schema: str, data_id: str)requests.models.Response

Get a data point from a Vespa app.

Parameters
  • schema – The schema that we are getting data from.

  • data_id – Unique id associated with this data point.

Returns

Response of the HTTP GET request.

get_batch

Vespa.get_batch(batch: List)

Async get a batch of data from a Vespa app.

Parameters

batch – A list of tuples with ‘schema’ and ‘id’.

Returns

update_data

Vespa.update_data(schema: str, data_id: str, fields: Dict, create: bool = False)vespa.io.VespaResponse

Update a data point in a Vespa app.

Parameters
  • schema – The schema that we are updating data.

  • data_id – Unique id associated with this data point.

  • fields – Dict containing all the fields you want to update.

  • create – If true, updates to non-existent documents will create an empty document to update

Returns

Response of the HTTP PUT request.

update_batch

Vespa.update_batch(batch: List)

Update a batch of data points.

Parameters

batch – A list of tuples with ‘schema’, ‘id’, ‘fields’, and ‘create’

Returns

delete_data

Vespa.delete_data(schema: str, data_id: str)vespa.io.VespaResponse

Delete a data point from a Vespa app.

Parameters
  • schema – The schema that we are deleting data from.

  • data_id – Unique id associated with this data point.

Returns

Response of the HTTP DELETE request.

delete_batch

Vespa.delete_batch(batch: List)

Async delete a batch of data from a Vespa app.

Parameters

batch – A list of tuples with ‘schema’ and ‘id’

Returns

delete_all_docs

Vespa.delete_all_docs(content_cluster_name: str, schema: str)requests.models.Response

Delete all documents associated with the schema

Parameters
  • content_cluster_name – Name of content cluster to GET from, or visit.

  • schema – The schema that we are deleting data from.

Returns

Response of the HTTP DELETE request.

Query

Vespa.query(body: Optional[Dict] = None, query: Optional[str] = None, query_model: Optional[vespa.query.QueryModel] = None, debug_request: bool = False, recall: Optional[Tuple] = None, **kwargs)vespa.io.VespaQueryResponse

Send a query request to the Vespa application.

Either send ‘body’ containing all the request parameters or specify ‘query’ and ‘query_model’.

Parameters
  • body – Dict containing all the request parameters.

  • query – Query string

  • query_model – Query model

  • debug_request – return request body for debugging instead of sending the request.

  • recall – Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.

  • kwargs – Additional parameters to be sent along the request.

Returns

Either the request body if debug_request is True or the result from the Vespa application

Run experiments

evaluate

Vespa.evaluate(labeled_data: Union[List[Dict], pandas.core.frame.DataFrame], eval_metrics: List[vespa.evaluation.EvalMetric], query_model: Union[vespa.query.QueryModel, List[vespa.query.QueryModel]], id_field: str, default_score: int = 0, detailed_metrics=False, per_query=False, aggregators=None, **kwargs)pandas.core.frame.DataFrame

Evaluate a QueryModel according to a list of EvalMetric.

labeled_data can be a DataFrame or a List of Dict:

>>> labeled_data_df = DataFrame(
...     data={
...         "qid": [0, 0, 1, 1],
...         "query": ["Intrauterine virus infections and congenital heart disease", "Intrauterine virus infections and congenital heart disease", "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus", "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus"],
...         "doc_id": [0, 3, 1, 5],
...         "relevance": [1,1,1,1]
...     }
... )
>>> labeled_data = [
...     {
...         "query_id": 0,
...         "query": "Intrauterine virus infections and congenital heart disease",
...         "relevant_docs": [{"id": 0, "score": 1}, {"id": 3, "score": 1}]
...     },
...     {
...         "query_id": 1,
...         "query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus",
...         "relevant_docs": [{"id": 1, "score": 1}, {"id": 5, "score": 1}]
...     }
... ]
Parameters
  • labeled_data – Labelled data containing query, query_id and relevant ids. See details about data format.

  • eval_metrics – A list of evaluation metrics.

  • query_model – Accept a Query model or a list of Query Models.

  • id_field – The Vespa field representing the document id.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • detailed_metrics – Return intermediate computations if available.

  • per_query – Set to True to return evaluation metrics per query.

  • aggregators – Used only if per_query=False. List of pandas friendly aggregators to summarize per model metrics. We use [“mean”, “median”, “std”] by default.

  • kwargs – Extra keyword arguments to be included in the Vespa Query.

Returns

DataFrame containing query_id and metrics according to the selected evaluation metrics.

evaluate_query

Vespa.evaluate_query(eval_metrics: List[vespa.evaluation.EvalMetric], query_model: vespa.query.QueryModel, query_id: str, query: str, id_field: str, relevant_docs: List[Dict], default_score: int = 0, detailed_metrics=False, **kwargs)Dict

Evaluate a query according to evaluation metrics

Parameters
  • eval_metrics – A list of evaluation metrics.

  • query_model – Query model.

  • query_id – Query id represented as str.

  • query – Query string.

  • id_field – The Vespa field representing the document id.

  • relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • detailed_metrics – Return intermediate computations if available.

  • kwargs – Extra keyword arguments to be included in the Vespa Query.

Returns

Dict containing query_id and metrics according to the selected evaluation metrics.

Collect training data

collect_training_data

Vespa.collect_training_data(labeled_data: List[Dict], id_field: str, query_model: vespa.query.QueryModel, number_additional_docs: int, relevant_score: int = 1, default_score: int = 0, show_progress: Optional[int] = None, **kwargs)pandas.core.frame.DataFrame

Collect training data based on a set of labelled data.

Parameters
  • labeled_data – Labelled data containing query, query_id and relevant ids.

  • id_field – The Vespa field representing the document id.

  • query_model – Query model.

  • number_additional_docs – Number of additional documents to retrieve for each relevant document.

  • relevant_score – Score to assign to relevant documents. Default to 1.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • show_progress – Prints the the current point being collected every show_progress step. Default to None, in which case progress is not printed.

  • kwargs – Extra keyword arguments to be included in the Vespa Query.

Returns

DataFrame containing document id (document_id), query id (query_id), scores (relevant) and vespa rank features returned by the Query model RankProfile used.

collect_training_data_point

Vespa.collect_training_data_point(query: str, query_id: str, relevant_id: str, id_field: str, query_model: vespa.query.QueryModel, number_additional_docs: int, fields: List[str], relevant_score: int = 1, default_score: int = 0, **kwargs)List[Dict]

Collect training data based on a single query

Parameters
  • query – Query string.

  • query_id – Query id represented as str.

  • relevant_id – Relevant id represented as a str.

  • id_field – The Vespa field representing the document id.

  • query_model – Query model.

  • number_additional_docs – Number of additional documents to retrieve for each relevant document.

  • fields – Which fields should be retrieved.

  • relevant_score – Score to assign to relevant documents. Default to 1.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • kwargs – Extra keyword arguments to be included in the Vespa Query.

Returns

List of dicts containing the document id (document_id), query id (query_id), scores (relevant) and vespa rank features returned by the Query model RankProfile used.

Query Model

A QueryModel is an abstraction that encapsulates all the relevant information controlling how your app match and rank documents. A QueryModel can be used for querying (query()), evaluating (evaluate()) and collecting data (collect_training_data()) from your app.

Create a QueryModel

class vespa.query.QueryModel(name: str = 'default_name', query_properties: Optional[List[vespa.query.QueryProperty]] = None, match_phase: vespa.query.MatchFilter = <vespa.query.AND object>, rank_profile: vespa.query.RankProfile = <vespa.query.RankProfile object>, body_function: Optional[Callable[[str], Dict]] = None)
__init__(name: str = 'default_name', query_properties: Optional[List[vespa.query.QueryProperty]] = None, match_phase: vespa.query.MatchFilter = <vespa.query.AND object>, rank_profile: vespa.query.RankProfile = <vespa.query.RankProfile object>, body_function: Optional[Callable[[str], Dict]] = None)None

Define a query model.

Parameters
  • name – Name of the query model. Used to tag model related quantities, like evaluation metrics.

  • query_properties – Optional list of QueryProperty.

  • match_phase – Define the match criteria. One of the MatchFilter options available.

  • rank_profile – Define the rank criteria.

  • body_function – Function that take query as parameter and returns the body of a Vespa query.

create_body(query: str)Dict[str, str]

Create the appropriate request body to be sent to Vespa.

Parameters

query – Query input.

Returns

dict representing the request body.

Match phase

Union

class vespa.query.Union(*args: vespa.query.MatchFilter)
__init__(*args: vespa.query.MatchFilter)None

Match documents that belongs to the union of many match filters.

Parameters

args – Match filters to be taken the union of.

create_match_filter(query: str)str

Create part of the YQL expression related to the filter.

Parameters

query – Query input.

Returns

Part of the YQL expression related to the filter.

get_query_properties(query: Optional[str] = None)Dict[str, str]

Relevant request properties associated with the filter.

Parameters

query – Query input.

Returns

dict containing the relevant request properties associated with the filter.

AND

class vespa.query.AND
__init__()None

Filter that match document containing all the query terms.

create_match_filter(query: str)str

Create part of the YQL expression related to the filter.

Parameters

query – Query input.

Returns

Part of the YQL expression related to the filter.

get_query_properties(query: Optional[str] = None)Dict

Relevant request properties associated with the filter.

Parameters

query – Query input.

Returns

dict containing the relevant request properties associated with the filter.

OR

class vespa.query.OR
__init__()None

Filter that match any document containing at least one query term.

create_match_filter(query: str)str

Create part of the YQL expression related to the filter.

Parameters

query – Query input.

Returns

Part of the YQL expression related to the filter.

get_query_properties(query: Optional[str] = None)Dict

Relevant request properties associated with the filter.

Parameters

query – Query input.

Returns

dict containing the relevant request properties associated with the filter.

WeakAnd

class vespa.query.WeakAnd(hits: int, field: str = 'default')
__init__(hits: int, field: str = 'default')None

Match documents according to the weakAND algorithm.

Reference: https://docs.vespa.ai/documentation/using-wand-with-vespa.html

Parameters
  • hits – Lower bound on the number of hits to be retrieved.

  • field – Which Vespa field to search.

create_match_filter(query: str)str

Create part of the YQL expression related to the filter.

Parameters

query – Query input.

Returns

Part of the YQL expression related to the filter.

get_query_properties(query: Optional[str] = None)Dict

Relevant request properties associated with the filter.

Parameters

query – Query input.

Returns

dict containing the relevant request properties associated with the filter.

ANN

class vespa.query.ANN(doc_vector: str, query_vector: str, hits: int, label: str, approximate: bool = True)
__init__(doc_vector: str, query_vector: str, hits: int, label: str, approximate: bool = True)None

Match documents according to the nearest neighbor operator.

Reference: https://docs.vespa.ai/documentation/reference/query-language-reference.html#nearestneighbor

Parameters
  • doc_vector – Name of the document field to be used in the distance calculation.

  • query_vector – Name of the query field to be used in the distance calculation.

  • hits – Lower bound on the number of hits to return.

  • label – A label to identify this specific operator instance.

  • approximate – True to use approximate nearest neighbor and False to use brute force. Default to True.

create_match_filter(query: str)str

Create part of the YQL expression related to the filter.

Parameters

query – Query input.

Returns

Part of the YQL expression related to the filter.

get_query_properties(query: Optional[str] = None)Dict[str, str]

Relevant request properties associated with the filter.

Parameters

query – Query input.

Returns

dict containing the relevant request properties associated with the filter.

Rank Profile

RankProfile

class vespa.query.RankProfile(name: str = 'default', list_features: bool = False)
__init__(name: str = 'default', list_features: bool = False)None

Define a rank profile.

Parameters
  • name – Name of the rank profile as defined in a Vespa search definition.

  • list_features – Should the ranking features be returned. Either ‘true’ or ‘false’.

Query Properties

QueryRankingFeature

class vespa.query.QueryRankingFeature(name: str, mapping: Callable[[str], List[float]])
__init__(name: str, mapping: Callable[[str], List[float]])None

Include ranking.feature.query into a Vespa query.

Parameters
  • name – Name of the feature.

  • mapping – Function mapping a string to a list of floats.

get_query_properties(query: Optional[str] = None)Dict[str, str]

Extract query property syntax.

Parameters

query – Query input.

Returns

dict containing the relevant request properties to be included in the query.

Evaluation Metrics

MatchRatio

class vespa.evaluation.MatchRatio
__init__()None

Computes the ratio of documents retrieved by the match phase.

evaluate_query(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False)Dict

Evaluate query results.

Parameters
  • query_results – Raw query results returned by Vespa.

  • relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.

  • id_field – The Vespa field representing the document id.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • detailed_metrics – Return intermediate computations if available.

Returns

Dict containing the number of retrieved docs (_retrieved_docs), the number of docs available in the corpus (_docs_available) and the match ratio.

Recall

class vespa.evaluation.Recall(at: int)
__init__(at: int)None

Compute the recall at position at

Parameters

at – Maximum position on the resulting list to look for relevant docs.

evaluate_query(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False)Dict

Evaluate query results.

There is an assumption that only documents with score > 0 are relevant. Recall is equal to zero in case no relevant documents with score > 0 is provided.

Parameters
  • query_results – Raw query results returned by Vespa.

  • relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.

  • id_field – The Vespa field representing the document id.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • detailed_metrics – Return intermediate computations if available.

Returns

Dict containing the recall value.

ReciprocalRank

class vespa.evaluation.ReciprocalRank(at: int)
__init__(at: int)

Compute the reciprocal rank at position at

Parameters

at – Maximum position on the resulting list to look for relevant docs.

evaluate_query(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False)Dict

Evaluate query results.

There is an assumption that only documents with score > 0 are relevant.

Parameters
  • query_results – Raw query results returned by Vespa.

  • relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.

  • id_field – The Vespa field representing the document id.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • detailed_metrics – Return intermediate computations if available.

Returns

Dict containing the reciprocal rank value.

NormalizedDiscountedCumulativeGain

class vespa.evaluation.NormalizedDiscountedCumulativeGain(at: int)
__init__(at: int)

Compute the normalized discounted cumulative gain at position at.

Parameters

at – Maximum position on the resulting list to look for relevant docs.

evaluate_query(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False)Dict

Evaluate query results.

There is an assumption that documents returned by the query that are not included in the set of relevant documents have score equal to zero. Similarly, if the query returns a number N < at documents, we will assume that those N - at missing scores are equal to zero.

Parameters
  • query_results – Raw query results returned by Vespa.

  • relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.

  • id_field – The Vespa field representing the document id.

  • default_score – Score to assign to the additional documents that are not relevant. Default to 0.

  • detailed_metrics – Return intermediate computations if available.

Returns

Dict containing the ideal discounted cumulative gain (_ideal_dcg), the discounted cumulative gain (_dcg) and the normalized discounted cumulative gain.