Reference API
ApplicationPackage
-
class
vespa.package.
ApplicationPackage
(name: str, schema: Optional[List[vespa.package.Schema]] = None, query_profile: Optional[vespa.package.QueryProfile] = None, query_profile_type: Optional[vespa.package.QueryProfileType] = None, stateless_model_evaluation: bool = False, create_schema_by_default: bool = True, create_query_profile_by_default: bool = True, tasks: Optional[List[vespa.package.Task]] = None, default_query_model: Optional[vespa.query.QueryModel] = None) -
__init__
(name: str, schema: Optional[List[vespa.package.Schema]] = None, query_profile: Optional[vespa.package.QueryProfile] = None, query_profile_type: Optional[vespa.package.QueryProfileType] = None, stateless_model_evaluation: bool = False, create_schema_by_default: bool = True, create_query_profile_by_default: bool = True, tasks: Optional[List[vespa.package.Task]] = None, default_query_model: Optional[vespa.query.QueryModel] = None) → None Create an Application Package. An
ApplicationPackage
instance comes with a defaultSchema
that contains a defaultDocument
Parameters: - name – Application name. Cannot contain ‘-’ or ‘_’.
- schema – List of
Schema`s of the application. If `None
, an emptySchema
with the same name of the application will be created by default. - query_profile –
QueryProfile
of the application. If None, aQueryProfile
named default withQueryProfileType
named root will be created by default. - query_profile_type –
QueryProfileType
of the application. If None, a emptyQueryProfileType
named root will be created by default. - stateless_model_evaluation – Enable stateless model evaluation. Default to False.
- create_schema_by_default – Include a
Schema
with the same name as the application if no Schema is provided in the schema argument. - create_query_profile_by_default – Include a default
QueryProfile
andQueryProfileType
in case it is not explicitly defined by the user in the query_profile and query_profile_type parameters. - tasks – List of tasks to be served.
- default_query_model – Optional QueryModel to be used as default for the application.
The easiest way to get started is to create a default application package:
>>> ApplicationPackage(name="testapp") ApplicationPackage('testapp', [Schema('testapp', Document(None, None), None, None, [], False, None)], QueryProfile(None), QueryProfileType(None))
It will create a default
Schema
,QueryProfile
andQueryProfileType
that you can then populate with specifics of your application.
-
add_model_ranking
(model_config: vespa.package.ModelConfig, schema=None, include_model_summary_features=False, document_field_indexing=None, **kwargs) → None Add ranking profile based on a specific model config.
Parameters: - model_config – Model config instance specifying the model to be used on the RankProfile.
- schema – Name of the schema to add model ranking to.
- include_model_summary_features – True to include model specific summary features, such as inputs and outputs that are useful for debugging. Default to False as this requires an extra model evaluation when fetching summary features.
- document_field_indexing – List of indexing attributes for the document fields required by the ranking model.
- kwargs – Further arguments to be passed to RankProfile.
Returns: None
-
add_schema
(*schemas) → None Add
Schema
’s to the application package.Parameters: schemas – schemas to be added Returns:
-
to_files
(root: pathlib.Path) → None Export the application package as a directory tree.
Parameters: root – Directory to export files to Returns:
-
to_zip
() → _io.BytesIO Return the application package as zipped bytes, to be used in a subsequent deploy :return: BytesIO buffer
-
to_zipfile
(zfile: pathlib.Path) → None Export the application package as a deployable zipfile. See application packages for deployment options.
Parameters: zfile – Filename to export to Returns:
-
Schema
-
class
vespa.package.
Schema
(name: str, document: vespa.package.Document, fieldsets: Optional[List[vespa.package.FieldSet]] = None, rank_profiles: Optional[List[vespa.package.RankProfile]] = None, models: Optional[List[vespa.package.OnnxModel]] = None, global_document: bool = False, imported_fields: Optional[List[vespa.package.ImportedField]] = None) -
__init__
(name: str, document: vespa.package.Document, fieldsets: Optional[List[vespa.package.FieldSet]] = None, rank_profiles: Optional[List[vespa.package.RankProfile]] = None, models: Optional[List[vespa.package.OnnxModel]] = None, global_document: bool = False, imported_fields: Optional[List[vespa.package.ImportedField]] = None) → None Create a Vespa Schema.
Check the Vespa documentation for more detailed information about schemas.
Parameters: - name – Schema name.
- document – Vespa
Document
associated with the Schema. - fieldsets – A list of
FieldSet
associated with the Schema. - rank_profiles – A list of
RankProfile
associated with the Schema. - models – A list of
OnnxModel
associated with the Schema. - global_document – Set to True to copy the documents to all content nodes. Default to False.
- imported_fields – A list of
ImportedField
defining fields from global documents to be imported.
To create a Schema:
>>> Schema(name="schema_name", document=Document()) Schema('schema_name', Document(None, None), None, None, [], False, None)
-
add_field_set
(field_set: vespa.package.FieldSet) → None Add a
FieldSet
to the Schema.Parameters: field_set – field sets to be added.
-
add_fields
(*fields) → None Add
Field
to the Schema’sDocument
.Parameters: fields – fields to be added.
-
add_imported_field
(imported_field: vespa.package.ImportedField) → None Add a
ImportedField
to the Schema.Parameters: imported_field – imported field to be added.
-
add_model
(model: vespa.package.OnnxModel) → None Add a
OnnxModel
to the Schema. :param model: model to be added. :return: None.
-
add_rank_profile
(rank_profile: vespa.package.RankProfile) → None Add a
RankProfile
to the Schema.Parameters: rank_profile – rank profile to be added. Returns: None.
-
Document
-
class
vespa.package.
Document
(fields: Optional[List[vespa.package.Field]] = None, inherits: Optional[str] = None) -
__init__
(fields: Optional[List[vespa.package.Field]] = None, inherits: Optional[str] = None) → None Create a Vespa Document.
Check the Vespa documentation for more detailed information about documents.
Parameters: fields – A list of Field
to include in the document’s schema.To create a Document:
>>> Document() Document(None, None)
>>> Document(fields=[Field(name="title", type="string")]) Document([Field('title', 'string', None, None, None, None)], None)
>>> Document(fields=[Field(name="title", type="string")], inherits="context") Document([Field('title', 'string', None, None, None, None)], context)
-
Field
-
class
vespa.package.
Field
(name: str, type: str, indexing: Optional[List[str]] = None, index: Optional[str] = None, attribute: Optional[List[str]] = None, ann: Optional[vespa.package.HNSW] = None) -
__init__
(name: str, type: str, indexing: Optional[List[str]] = None, index: Optional[str] = None, attribute: Optional[List[str]] = None, ann: Optional[vespa.package.HNSW] = None) → None Create a Vespa field.
Check the Vespa documentation for more detailed information about fields.
Once we have an
ApplicationPackage
instance containing aSchema
and aDocument
, we usually want to add fields so that we can store our data in a structured manner. We can accomplish that by creatingField
instances and adding those to theApplicationPackage
instance viaSchema
andDocument
methods.Parameters: - name – Field name.
- type – Field data type.
- indexing – Configures how to process data of a field during indexing.
- index – Sets index parameters. Content in fields with index are normalized and tokenized by default.
- attribute – Specifies a property of an index structure attribute.
- ann – Add configuration for approximate nearest neighbor.
>>> Field(name = "title", type = "string", indexing = ["index", "summary"], index = "enable-bm25") Field('title', 'string', ['index', 'summary'], 'enable-bm25', None, None)
>>> Field( ... name = "abstract", ... type = "string", ... indexing = ["attribute"], ... attribute=["fast-search", "fast-access"] ... ) Field('abstract', 'string', ['attribute'], None, ['fast-search', 'fast-access'], None)
>>> Field(name="tensor_field", ... type="tensor<float>(x[128])", ... indexing=["attribute"], ... ann=HNSW( ... distance_metric="euclidean", ... max_links_per_node=16, ... neighbors_to_explore_at_insert=200, ... ), ... ) Field('tensor_field', 'tensor<float>(x[128])', ['attribute'], None, None, HNSW('euclidean', 16, 200))
-
FieldSet
-
class
vespa.package.
FieldSet
(name: str, fields: List[str]) -
__init__
(name: str, fields: List[str]) → None Create a Vespa field set.
A fieldset groups fields together for searching. Check the Vespa documentation for more detailed information about field sets.
Parameters: - name – Name of the fieldset
- fields – Field names to be included in the fieldset.
>>> FieldSet(name="default", fields=["title", "body"]) FieldSet('default', ['title', 'body'])
-
RankProfile
-
class
vespa.package.
RankProfile
(name: str, first_phase: str, inherits: Optional[str] = None, constants: Optional[Dict] = None, functions: Optional[List[vespa.package.Function]] = None, summary_features: Optional[List] = None, second_phase: Optional[vespa.package.SecondPhaseRanking] = None) -
__init__
(name: str, first_phase: str, inherits: Optional[str] = None, constants: Optional[Dict] = None, functions: Optional[List[vespa.package.Function]] = None, summary_features: Optional[List] = None, second_phase: Optional[vespa.package.SecondPhaseRanking] = None) → None Create a Vespa rank profile.
Rank profiles are used to specify an alternative ranking of the same data for different purposes, and to experiment with new rank settings. Check the Vespa documentation for more detailed information about rank profiles.
Parameters: - name – Rank profile name.
- first_phase – The config specifying the first phase of ranking. More info <https://docs.vespa.ai/en/reference/schema-reference.html#firstphase-rank>`__ about first phase ranking.
- inherits – The inherits attribute is optional. If defined, it contains the name of one other rank profile in the same schema. Values not defined in this rank profile will then be inherited.
- constants – Dict of constants available in ranking expressions, resolved and optimized at configuration time. More info <https://docs.vespa.ai/en/reference/schema-reference.html#constants>`__ about constants.
- functions – Optional list of
Function
representing rank functions to be included in the rank profile. - summary_features – List of rank features to be included with each hit. More info <https://docs.vespa.ai/en/reference/schema-reference.html#summary-features>`__ about summary features.
- second_phase – Optional config specifying the second phase of ranking.
See
SecondPhaseRanking
.
>>> RankProfile(name = "default", first_phase = "nativeRank(title, body)") RankProfile('default', 'nativeRank(title, body)', None, None, None, None, None)
>>> RankProfile(name = "new", first_phase = "BM25(title)", inherits = "default") RankProfile('new', 'BM25(title)', 'default', None, None, None, None)
>>> RankProfile( ... name = "new", ... first_phase = "BM25(title)", ... inherits = "default", ... constants={"TOKEN_NONE": 0, "TOKEN_CLS": 101, "TOKEN_SEP": 102}, ... summary_features=["BM25(title)"] ... ) RankProfile('new', 'BM25(title)', 'default', {'TOKEN_NONE': 0, 'TOKEN_CLS': 101, 'TOKEN_SEP': 102}, None, ['BM25(title)'], None)
>>> RankProfile( ... name="bert", ... first_phase="bm25(title) + bm25(body)", ... second_phase=SecondPhaseRanking(expression="1.25 * bm25(title) + 3.75 * bm25(body)", rerank_count=10), ... inherits="default", ... constants={"TOKEN_NONE": 0, "TOKEN_CLS": 101, "TOKEN_SEP": 102}, ... functions=[ ... Function( ... name="question_length", ... expression="sum(map(query(query_token_ids), f(a)(a > 0)))" ... ), ... Function( ... name="doc_length", ... expression="sum(map(attribute(doc_token_ids), f(a)(a > 0)))" ... ) ... ], ... summary_features=["question_length", "doc_length"] ... ) RankProfile('bert', 'bm25(title) + bm25(body)', 'default', {'TOKEN_NONE': 0, 'TOKEN_CLS': 101, 'TOKEN_SEP': 102}, [Function('question_length', 'sum(map(query(query_token_ids), f(a)(a > 0)))', None), Function('doc_length', 'sum(map(attribute(doc_token_ids), f(a)(a > 0)))', None)], ['question_length', 'doc_length'], SecondPhaseRanking('1.25 * bm25(title) + 3.75 * bm25(body)', 10))
-
QueryProfile
-
class
vespa.package.
QueryProfile
(fields: Optional[List[vespa.package.QueryField]] = None) -
__init__
(fields: Optional[List[vespa.package.QueryField]] = None) → None Create a Vespa Query Profile.
Check the Vespa documentation for more detailed information about query profiles.
A
QueryProfile
is a named collection of query request parameters given in the configuration. The query request can specify a query profile whose parameters will be used as parameters of that request. The query profiles may optionally be type checked. Type checking is turned on by referencing aQueryProfileType
from the query profile.Parameters: fields – A list of QueryField
.>>> QueryProfile(fields=[QueryField(name="maxHits", value=1000)]) QueryProfile([QueryField('maxHits', 1000)])
-
add_fields
(*fields) → None Add
QueryField
’s to the Query Profile.Parameters: fields – fields to be added >>> query_profile = QueryProfile() >>> query_profile.add_fields(QueryField(name="maxHits", value=1000))
-
QueryField
-
class
vespa.package.
QueryField
(name: str, value: Union[str, int, float]) -
__init__
(name: str, value: Union[str, int, float]) → None Create a field to be included in a
QueryProfile
.Parameters: - name – Field name.
- value – Field value.
>>> QueryField(name="maxHits", value=1000) QueryField('maxHits', 1000)
-
QueryProfileType
-
class
vespa.package.
QueryProfileType
(fields: Optional[List[vespa.package.QueryTypeField]] = None) -
__init__
(fields: Optional[List[vespa.package.QueryTypeField]] = None) → None Create a Vespa Query Profile Type.
Check the Vespa documentation for more detailed information about query profile types.
An
ApplicationPackage
instance comes with a defaultQueryProfile
named default that is associated with aQueryProfileType
named root, meaning that you usually do not need to create those yourself, only add fields to them when required.Parameters: fields – A list of QueryTypeField
.>>> QueryProfileType( ... fields = [ ... QueryTypeField( ... name="ranking.features.query(tensor_bert)", ... type="tensor<float>(x[768])" ... ) ... ] ... ) QueryProfileType([QueryTypeField('ranking.features.query(tensor_bert)', 'tensor<float>(x[768])')])
-
add_fields
(*fields) → None Add
QueryTypeField
’s to the Query Profile Type.Parameters: fields – fields to be added >>> query_profile_type = QueryProfileType() >>> query_profile_type.add_fields( ... QueryTypeField( ... name="age", ... type="integer" ... ), ... QueryTypeField( ... name="profession", ... type="string" ... ) ... )
-
QueryTypeField
-
class
vespa.package.
QueryTypeField
(name: str, type: str) -
__init__
(name: str, type: str) → None Create a field to be included in a
QueryProfileType
.Parameters: - name – Field name.
- type – Field type.
>>> QueryTypeField( ... name="ranking.features.query(title_bert)", ... type="tensor<float>(x[768])" ... ) QueryTypeField('ranking.features.query(title_bert)', 'tensor<float>(x[768])')
-
SequenceClassification
-
class
vespa.ml.
SequenceClassification
(model_id: str, model: str, tokenizer: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) -
__init__
(model_id: str, model: str, tokenizer: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) Sequence Classification task.
It takes a text input and returns an array of floats depending on which model is used to solve the task.
Parameters: - model_id – Id used to identify the model on Vespa applications.
- model – Id of the model as used by the model hub. Alternatively, it can also be the path to the folder containing the model files, as long as the model config is also there.
- tokenizer – Id of the tokenizer as used by the model hub. Alternatively, it can also be the path to the folder containing the tokenizer files, as long as the model config is also there.
- output_file – Output file to write output messages.
-
ModelServer
-
class
vespa.package.
ModelServer
(name: str, tasks: Optional[List[vespa.package.Task]] = None) -
__init__
(name: str, tasks: Optional[List[vespa.package.Task]] = None) Create a Vespa stateless model evaluation server.
A Vespa stateless model evaluation server is a simplified Vespa application without content clusters.
Parameters: - name – Application name.
- tasks – List of tasks to be served.
-
VespaDocker
-
class
vespa.deployment.
VespaDocker
(port: int = 8080, container_memory: Union[str, int] = 4294967296, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, container: Optional[docker.models.containers.Container] = None, container_image: str = 'vespaengine/vespa', cfgsrv_port: int = 19071) -
__init__
(port: int = 8080, container_memory: Union[str, int] = 4294967296, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, container: Optional[docker.models.containers.Container] = None, container_image: str = 'vespaengine/vespa', cfgsrv_port: int = 19071) → None Manage Docker deployments.
Parameters: - port – Container port.
- cfgsrv_port – Config Server port.
- output_file – Output file to write output messages.
- container_memory – Docker container memory available to the application.
- container – Used when instantiating VespaDocker from a running container.
- container_image – Docker container image.
-
deploy
(application_package: vespa.package.ApplicationPackage) → vespa.application.Vespa Deploy the application package into a Vespa container. :param application_package: ApplicationPackage to be deployed. :return: a Vespa connection instance.
-
deploy_from_disk
(application_name: str, application_root: pathlib.Path) → vespa.application.Vespa Deploy from a directory tree. Used when making changes to application package files not supported by pyvespa - this is why this method is not found in the ApplicationPackage class.
Parameters: - application_name – Application package name.
- application_root – Application package directory root
Returns: a Vespa connection instance.
-
static
from_container_name_or_id
(name_or_id: str, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) → vespa.deployment.VespaDocker Instantiate VespaDocker from a running container.
Parameters: - name_or_id – Name or id of the container.
- output_file – Output file to write output messages.
Raises: ValueError – Exception if container not found
Returns: VespaDocker instance associated with the running container.
-
restart_services
() Restart Vespa services.
Returns: None
-
start_services
() Start Vespa services.
Returns: None
-
stop_services
() Stop Vespa services.
Returns: None
-
wait_for_config_server_start
(max_wait) Waits for Config Server to start inside the Docker image
Parameters: max_wait – Seconds to wait for the application endpoint Raises: RuntimeError – Raises runtime error if the config server does not start within max_wait Returns:
-
VespaCloud
-
class
vespa.deployment.
VespaCloud
(tenant: str, application: str, application_package: vespa.package.ApplicationPackage, key_location: Optional[str] = None, key_content: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) -
__init__
(tenant: str, application: str, application_package: vespa.package.ApplicationPackage, key_location: Optional[str] = None, key_content: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) → None Deploy application to the Vespa Cloud (cloud.vespa.ai)
Parameters: - tenant – Tenant name registered in the Vespa Cloud.
- application – Application name registered in the Vespa Cloud.
- application_package – ApplicationPackage to be deployed.
- key_location – Location of the private key used for signing HTTP requests to the Vespa Cloud.
- key_content – Content of the private key used for signing HTTP requests to the Vespa Cloud. Use only when key file is not available.
- output_file – Output file to write output messages.
-
delete
(instance: str) Delete the specified instance from the dev environment in the Vespa Cloud. :param instance: Name of the instance to delete. :return:
-
deploy
(instance: str, disk_folder: Optional[str] = None) → vespa.application.Vespa Deploy the given application package as the given instance in the Vespa Cloud dev environment.
Parameters: - instance – Name of this instance of the application, in the Vespa Cloud.
- disk_folder – Disk folder to save the required Vespa config files. Default to application name folder within user’s current working directory.
Returns: a Vespa connection instance.
-
Vespa
-
class
vespa.application.
Vespa
(url: str, port: Optional[int] = None, deployment_message: Optional[List[str]] = None, cert: Optional[str] = None, key: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, application_package: Optional[vespa.package.ApplicationPackage] = None) -
__init__
(url: str, port: Optional[int] = None, deployment_message: Optional[List[str]] = None, cert: Optional[str] = None, key: Optional[str] = None, output_file: IO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, application_package: Optional[vespa.package.ApplicationPackage] = None) → None Establish a connection with an existing Vespa application.
Parameters: - url – Vespa instance URL.
- port – Vespa instance port.
- deployment_message – Message returned by Vespa engine after deployment. Used internally by deploy methods.
- cert – Path to certificate and key file in case the ‘key’ parameter is none. If ‘key’ is not None, this should be the path of the certificate file.
- key – Path to the key file.
- output_file – Output file to write output messages.
- application_package – Application package definition used to deploy the application.
>>> Vespa(url = "https://cord19.vespa.ai") # doctest: +SKIP
>>> Vespa(url = "http://localhost", port = 8080) Vespa(http://localhost, 8080)
>>> Vespa(url = "https://api.vespa-external.aws.oath.cloud", port = 4443, cert = "/path/to/cert-and-key.pem") # doctest: +SKIP
-
application_package
Get application package definition, if available.
-
asyncio
(connections: Optional[int] = 100, total_timeout: int = 10) → vespa.application.VespaAsync Access Vespa asynchronous connection layer
Parameters: - connections – Number of allowed concurrent connections
- total_timeout – Total timeout in secs.
Returns: Instance of Vespa asynchronous layer.
-
collect_training_data
(labeled_data: Union[List[Dict], pandas.core.frame.DataFrame], id_field: str, query_model: vespa.query.QueryModel, number_additional_docs: int, relevant_score: int = 1, default_score: int = 0, show_progress: Optional[int] = None, **kwargs) → pandas.core.frame.DataFrame Collect training data based on a set of labelled data.
labeled_data can be a DataFrame or a List of Dict:
>>> labeled_data_df = DataFrame( ... data={ ... "qid": [0, 0, 1, 1], ... "query": ["Intrauterine virus infections and congenital heart disease", "Intrauterine virus infections and congenital heart disease", "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus", "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus"], ... "doc_id": [0, 3, 1, 5], ... "relevance": [1,1,1,1] ... } ... )
>>> labeled_data = [ ... { ... "query_id": 0, ... "query": "Intrauterine virus infections and congenital heart disease", ... "relevant_docs": [{"id": 0, "score": 1}, {"id": 3, "score": 1}] ... }, ... { ... "query_id": 1, ... "query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus", ... "relevant_docs": [{"id": 1, "score": 1}, {"id": 5, "score": 1}] ... } ... ]
Parameters: - labeled_data – Labelled data containing query, query_id and relevant ids. See details about data format.
- id_field – The Vespa field representing the document id.
- query_model – Query model.
- number_additional_docs – Number of additional documents to retrieve for each relevant document.
- relevant_score – Score to assign to relevant documents. Default to 1.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- show_progress – Prints the the current point being collected every show_progress step. Default to None, in which case progress is not printed.
- kwargs – Extra keyword arguments to be included in the Vespa Query.
Returns: DataFrame containing document id (document_id), query id (query_id), scores (relevant) and vespa rank features returned by the Query model RankProfile used.
-
collect_training_data_point
(query: str, query_id: str, relevant_id: str, id_field: str, query_model: vespa.query.QueryModel, number_additional_docs: int, fields: List[str], relevant_score: int = 1, default_score: int = 0, **kwargs) → List[Dict] Collect training data based on a single query
Parameters: - query – Query string.
- query_id – Query id represented as str.
- relevant_id – Relevant id represented as a str.
- id_field – The Vespa field representing the document id.
- query_model – Query model.
- number_additional_docs – Number of additional documents to retrieve for each relevant document.
- fields – Which fields should be retrieved.
- relevant_score – Score to assign to relevant documents. Default to 1.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- kwargs – Extra keyword arguments to be included in the Vespa Query.
Returns: List of dicts containing the document id (document_id), query id (query_id), scores (relevant) and vespa rank features returned by the Query model RankProfile used.
-
delete_all_docs
(content_cluster_name: str, schema: str, namespace: str = None) → requests.models.Response Delete all documents associated with the schema
Parameters: - content_cluster_name – Name of content cluster to GET from, or visit.
- schema – The schema that we are deleting data from.
- namespace – The namespace that we are deleting data from. If no namespace is provided the schema is used.
Returns: Response of the HTTP DELETE request.
-
delete_batch
(batch: List[Dict], schema: Optional[str] = None, asynchronous=True, connections: Optional[int] = 100, total_timeout: int = 100, namespace: Optional[str] = None) Delete a batch of data from a Vespa app.
Parameters: - batch – A list of dict containing the key ‘id’.
- schema – The schema that we are deleting data from. The schema is optional in case it is possible to infer the schema from the application package.
- asynchronous – Set True to get data in async mode. Default to True.
- connections – Number of allowed concurrent connections, valid only if asynchronous=True.
- total_timeout – Total timeout in secs for each of the concurrent requests when using asynchronous=True.
- namespace – The namespace that we are deleting data from. If no namespace is provided the schema is used.
Returns: List of HTTP POST responses
-
delete_data
(schema: str, data_id: str, namespace: str = None) → vespa.io.VespaResponse Delete a data point from a Vespa app.
Parameters: - schema – The schema that we are deleting data from.
- data_id – Unique id associated with this data point.
- namespace – The namespace that we are deleting data from. If no namespace is provided the schema is used.
Returns: Response of the HTTP DELETE request.
-
evaluate
(labeled_data: Union[List[Dict], pandas.core.frame.DataFrame], eval_metrics: List[vespa.evaluation.EvalMetric], query_model: Union[vespa.query.QueryModel, List[vespa.query.QueryModel]], id_field: str, default_score: int = 0, detailed_metrics=False, per_query=False, aggregators=None, **kwargs) → pandas.core.frame.DataFrame Evaluate a
QueryModel
according to a list ofEvalMetric
.labeled_data can be a DataFrame or a List of Dict:
>>> labeled_data_df = DataFrame( ... data={ ... "qid": [0, 0, 1, 1], ... "query": ["Intrauterine virus infections and congenital heart disease", "Intrauterine virus infections and congenital heart disease", "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus", "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus"], ... "doc_id": [0, 3, 1, 5], ... "relevance": [1,1,1,1] ... } ... )
>>> labeled_data = [ ... { ... "query_id": 0, ... "query": "Intrauterine virus infections and congenital heart disease", ... "relevant_docs": [{"id": 0, "score": 1}, {"id": 3, "score": 1}] ... }, ... { ... "query_id": 1, ... "query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus", ... "relevant_docs": [{"id": 1, "score": 1}, {"id": 5, "score": 1}] ... } ... ]
Parameters: - labeled_data – Labelled data containing query, query_id and relevant ids. See details about data format.
- eval_metrics – A list of evaluation metrics.
- query_model – Accept a Query model or a list of Query Models.
- id_field – The Vespa field representing the document id.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- detailed_metrics – Return intermediate computations if available.
- per_query – Set to True to return evaluation metrics per query.
- aggregators – Used only if per_query=False. List of pandas friendly aggregators to summarize per model metrics. We use [“mean”, “median”, “std”] by default.
- kwargs – Extra keyword arguments to be included in the Vespa Query.
Returns: DataFrame containing query_id and metrics according to the selected evaluation metrics.
-
evaluate_query
(eval_metrics: List[vespa.evaluation.EvalMetric], query_model: vespa.query.QueryModel, query_id: str, query: str, id_field: str, relevant_docs: List[Dict], default_score: int = 0, detailed_metrics=False, **kwargs) → Dict Evaluate a query according to evaluation metrics
Parameters: - eval_metrics – A list of evaluation metrics.
- query_model – Query model.
- query_id – Query id represented as str.
- query – Query string.
- id_field – The Vespa field representing the document id.
- relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- detailed_metrics – Return intermediate computations if available.
- kwargs – Extra keyword arguments to be included in the Vespa Query.
Returns: Dict containing query_id and metrics according to the selected evaluation metrics.
-
feed_batch
(batch: List[Dict], schema: Optional[str] = None, asynchronous=True, connections: Optional[int] = 100, total_timeout: int = 100, namespace: Optional[str] = None) Feed a batch of data to a Vespa app.
Parameters: - batch – A list of dict containing the keys ‘id’ and ‘fields’ to be used in the
feed_data_point()
. - schema – The schema that we are sending data to. The schema is optional in case it is possible to infer the schema from the application package.
- asynchronous – Set True to send data in async mode. Default to True.
- connections – Number of allowed concurrent connections, valid only if asynchronous=True.
- total_timeout – Total timeout in secs for each of the concurrent requests when using asynchronous=True.
- namespace – The namespace that we are sending data to. If no namespace is provided the schema is used.
Returns: List of HTTP POST responses
- batch – A list of dict containing the keys ‘id’ and ‘fields’ to be used in the
-
feed_data_point
(schema: str, data_id: str, fields: Dict, namespace: str = None) → vespa.io.VespaResponse Feed a data point to a Vespa app.
Parameters: - schema – The schema that we are sending data to.
- data_id – Unique id associated with this data point.
- fields – Dict containing all the fields required by the schema.
- namespace – The namespace that we are sending data to.
Returns: Response of the HTTP POST request.
-
feed_df
(df: pandas.core.frame.DataFrame, include_id: bool = True, **kwargs) Feed data contained in a DataFrame.
Parameters: - df – A DataFrame containing a required ‘id’ column and the remaining fields to be fed.
- include_id – Include id on the fields to be fed. Default to True.
- kwargs – Additional parameters are passed to
feed_batch()
.
Returns: List of HTTP POST responses
-
get_application_status
() → Optional[requests.models.Response] Get application status.
Returns:
-
get_batch
(batch: List[Dict], schema: Optional[str] = None, asynchronous=True, connections: Optional[int] = 100, total_timeout: int = 100, namespace: Optional[str] = None) Get a batch of data from a Vespa app.
Parameters: - batch – A list of dict containing the key ‘id’.
- schema – The schema that we are getting data from. The schema is optional in case it is possible to infer the schema from the application package.
- asynchronous – Set True to get data in async mode. Default to True.
- connections – Number of allowed concurrent connections, valid only if asynchronous=True.
- total_timeout – Total timeout in secs for each of the concurrent requests when using asynchronous=True.
- namespace – The namespace that we are getting data from. If no namespace is provided the schema is used.
Returns: List of HTTP POST responses
-
get_data
(schema: str, data_id: str, namespace: str = None) → vespa.io.VespaResponse Get a data point from a Vespa app.
Parameters: - schema – The schema that we are getting data from.
- data_id – Unique id associated with this data point.
- namespace – The namespace that we are getting data from. If no namespace is provided the schema is used.
Returns: Response of the HTTP GET request.
-
get_model_endpoint
(model_id: Optional[str] = None) → Optional[requests.models.Response] Get model evaluation endpoints.
-
get_model_from_application_package
(model_name: str) Get model definition from application package, if available.
-
predict
(x, model_id, function_name='output_0') Obtain a stateless model evaluation.
Parameters: - x – Input where the format depends on the task that the model is serving.
- model_id – The id of the model used to serve the prediction.
- function_name – The name of the output function to be evaluated.
Returns: Model prediction.
-
query
(body: Optional[Dict] = None, query: Optional[str] = None, query_model: Optional[vespa.query.QueryModel] = None, debug_request: bool = False, recall: Optional[Tuple] = None, **kwargs) → vespa.io.VespaQueryResponse Send a query request to the Vespa application.
Either send ‘body’ containing all the request parameters or specify ‘query’ and ‘query_model’.
Parameters: - body – Dict containing all the request parameters.
- query – Query string
- query_model – Query model
- debug_request – return request body for debugging instead of sending the request.
- recall – Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.
- kwargs – Additional parameters to be sent along the request.
Returns: Either the request body if debug_request is True or the result from the Vespa application
-
query_batch
(body_batch: Optional[List[Dict]] = None, query_batch: Optional[List[str]] = None, query_model: Optional[vespa.query.QueryModel] = None, recall: Optional[List[Tuple]] = None, asynchronous=True, connections: Optional[int] = 100, total_timeout: int = 100, **kwargs) Send queries in batch to a Vespa app.
Parameters: - body_batch – A list of dict containing all the request parameters. Set to None if using ‘query_batch’.
- query_batch – A list of query strings. Set to None if using ‘body_batch’.
- query_model – Query model to use when sending query strings. Set to None if using ‘body_batch’.
- recall – List of tuples, one for each query. Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.
- asynchronous – Set True to send data in async mode. Default to True.
- connections – Number of allowed concurrent connections, valid only if asynchronous=True.
- total_timeout – Total timeout in secs for each of the concurrent requests when using asynchronous=True.
- kwargs – Additional parameters to be sent along the request.
Returns: List of HTTP POST responses
-
update_batch
(batch: List[Dict], schema: Optional[str] = None, asynchronous=True, connections: Optional[int] = 100, total_timeout: int = 100, namespace: Optional[str] = None) Update a batch of data in a Vespa app.
Parameters: - batch – A list of dict containing the keys ‘id’, ‘fields’ and ‘create’ (create defaults to False).
- schema – The schema that we are updating data to. The schema is optional in case it is possible to infer the schema from the application package.
- asynchronous – Set True to update data in async mode. Default to True.
- connections – Number of allowed concurrent connections, valid only if asynchronous=True.
- total_timeout – Total timeout in secs for each of the concurrent requests when using asynchronous=True.
- namespace – The namespace that we are updating data. If no namespace is provided the schema is used.
Returns: List of HTTP POST responses
-
update_data
(schema: str, data_id: str, fields: Dict, create: bool = False, namespace: str = None) → vespa.io.VespaResponse Update a data point in a Vespa app.
Parameters: - schema – The schema that we are updating data.
- data_id – Unique id associated with this data point.
- fields – Dict containing all the fields you want to update.
- create – If true, updates to non-existent documents will create an empty document to update
- namespace – The namespace that we are updating data. If no namespace is provided the schema is used.
Returns: Response of the HTTP PUT request.
-
wait_for_application_up
(max_wait) Wait for application ready.
Parameters: max_wait – Seconds to wait for the application endpoint Returns:
-
QueryModel
-
class
vespa.query.
QueryModel
(name: str = 'default_name', query_properties: Optional[List[vespa.query.QueryProperty]] = None, match_phase: vespa.query.MatchFilter = <vespa.query.AND object>, rank_profile: vespa.query.RankProfile = <vespa.query.RankProfile object>, body_function: Optional[Callable[[str], Dict]] = None) -
__init__
(name: str = 'default_name', query_properties: Optional[List[vespa.query.QueryProperty]] = None, match_phase: vespa.query.MatchFilter = <vespa.query.AND object>, rank_profile: vespa.query.RankProfile = <vespa.query.RankProfile object>, body_function: Optional[Callable[[str], Dict]] = None) → None Define a query model.
A
QueryModel
is an abstraction that encapsulates all the relevant information controlling how an app matches and ranks documents. A QueryModel can be used for querying (query()
), evaluating (evaluate()
) and collecting data (collect_training_data()
) from an app.Parameters: - name – Name of the query model. Used to tag model related quantities, like evaluation metrics.
- query_properties – Optional list of QueryProperty.
- match_phase – Define the match criteria. One of the MatchFilter options available.
- rank_profile – Define the rank criteria.
- body_function – Function that take query as parameter and returns the body of a Vespa query.
-
create_body
(query: str) → Dict[str, str] Create the appropriate request body to be sent to Vespa.
Parameters: query – Query input. Returns: dict representing the request body.
-
Union
-
class
vespa.query.
Union
(*args) -
__init__
(*args) → None Match documents that belongs to the union of many match filters.
Parameters: args – Match filters to be taken the union of.
-
create_match_filter
(query: str) → str Create part of the YQL expression related to the filter.
Parameters: query – Query input. Returns: Part of the YQL expression related to the filter.
-
get_query_properties
(query: Optional[str] = None) → Dict[str, str] Relevant request properties associated with the filter.
Parameters: query – Query input. Returns: dict containing the relevant request properties associated with the filter.
-
AND
-
class
vespa.query.
AND
-
__init__
() → None Filter that match document containing all the query terms.
-
create_match_filter
(query: str) → str Create part of the YQL expression related to the filter.
Parameters: query – Query input. Returns: Part of the YQL expression related to the filter.
-
get_query_properties
(query: Optional[str] = None) → Dict Relevant request properties associated with the filter.
Parameters: query – Query input. Returns: dict containing the relevant request properties associated with the filter.
-
OR
-
class
vespa.query.
OR
-
__init__
() → None Filter that match any document containing at least one query term.
-
create_match_filter
(query: str) → str Create part of the YQL expression related to the filter.
Parameters: query – Query input. Returns: Part of the YQL expression related to the filter.
-
get_query_properties
(query: Optional[str] = None) → Dict Relevant request properties associated with the filter.
Parameters: query – Query input. Returns: dict containing the relevant request properties associated with the filter.
-
WeakAnd
-
class
vespa.query.
WeakAnd
(hits: int, field: str = 'default') -
__init__
(hits: int, field: str = 'default') → None Match documents according to the weakAND algorithm.
Reference: https://docs.vespa.ai/en/using-wand-with-vespa.html
Parameters: - hits – Lower bound on the number of hits to be retrieved.
- field – Which Vespa field to search.
-
create_match_filter
(query: str) → str Create part of the YQL expression related to the filter.
Parameters: query – Query input. Returns: Part of the YQL expression related to the filter.
-
get_query_properties
(query: Optional[str] = None) → Dict Relevant request properties associated with the filter.
Parameters: query – Query input. Returns: dict containing the relevant request properties associated with the filter.
-
ANN
-
class
vespa.query.
ANN
(doc_vector: str, query_vector: str, hits: int, label: str, approximate: bool = True) -
__init__
(doc_vector: str, query_vector: str, hits: int, label: str, approximate: bool = True) → None Match documents according to the nearest neighbor operator.
Reference: https://docs.vespa.ai/en/reference/query-language-reference.html
Parameters: - doc_vector – Name of the document field to be used in the distance calculation.
- query_vector – Name of the query field to be used in the distance calculation.
- hits – Lower bound on the number of hits to return.
- label – A label to identify this specific operator instance.
- approximate – True to use approximate nearest neighbor and False to use brute force. Default to True.
-
create_match_filter
(query: str) → str Create part of the YQL expression related to the filter.
Parameters: query – Query input. Returns: Part of the YQL expression related to the filter.
-
get_query_properties
(query: Optional[str] = None) → Dict[str, str] Relevant request properties associated with the filter.
Parameters: query – Query input. Returns: dict containing the relevant request properties associated with the filter.
-
RankProfile
-
class
vespa.query.
RankProfile
(name: str = 'default', list_features: bool = False) -
__init__
(name: str = 'default', list_features: bool = False) → None Define a rank profile.
Parameters: - name – Name of the rank profile as defined in a Vespa search definition.
- list_features – Should the ranking features be returned. Either ‘true’ or ‘false’.
-
QueryRankingFeature
-
class
vespa.query.
QueryRankingFeature
(name: str, mapping: Callable[[str], List[float]]) -
__init__
(name: str, mapping: Callable[[str], List[float]]) → None Include ranking.feature.query into a Vespa query.
Parameters: - name – Name of the feature.
- mapping – Function mapping a string to a list of floats.
-
get_query_properties
(query: Optional[str] = None) → Dict[str, str] Extract query property syntax.
Parameters: query – Query input. Returns: dict containing the relevant request properties to be included in the query.
-
MatchRatio
-
class
vespa.evaluation.
MatchRatio
-
__init__
() → None Computes the ratio of documents retrieved by the match phase.
-
evaluate_query
(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False) → Dict Evaluate query results.
Parameters: - query_results – Raw query results returned by Vespa.
- relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.
- id_field – The Vespa field representing the document id.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- detailed_metrics – Return intermediate computations if available.
Returns: Dict containing the number of retrieved docs (_retrieved_docs), the number of docs available in the corpus (_docs_available) and the match ratio.
-
Recall
-
class
vespa.evaluation.
Recall
(at: int) -
__init__
(at: int) → None Compute the recall at position at
Parameters: at – Maximum position on the resulting list to look for relevant docs.
-
evaluate_query
(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False) → Dict Evaluate query results.
There is an assumption that only documents with score > 0 are relevant. Recall is equal to zero in case no relevant documents with score > 0 is provided.
Parameters: - query_results – Raw query results returned by Vespa.
- relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.
- id_field – The Vespa field representing the document id.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- detailed_metrics – Return intermediate computations if available.
Returns: Dict containing the recall value.
-
ReciprocalRank
-
class
vespa.evaluation.
ReciprocalRank
(at: int) -
__init__
(at: int) Compute the reciprocal rank at position at
Parameters: at – Maximum position on the resulting list to look for relevant docs.
-
evaluate_query
(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False) → Dict Evaluate query results.
There is an assumption that only documents with score > 0 are relevant.
Parameters: - query_results – Raw query results returned by Vespa.
- relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.
- id_field – The Vespa field representing the document id.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- detailed_metrics – Return intermediate computations if available.
Returns: Dict containing the reciprocal rank value.
-
NormalizedDiscountedCumulativeGain
-
class
vespa.evaluation.
NormalizedDiscountedCumulativeGain
(at: int) -
__init__
(at: int) Compute the normalized discounted cumulative gain at position at.
Parameters: at – Maximum position on the resulting list to look for relevant docs.
-
evaluate_query
(query_results: vespa.io.VespaQueryResponse, relevant_docs: List[Dict], id_field: str, default_score: int, detailed_metrics=False) → Dict Evaluate query results.
There is an assumption that documents returned by the query that are not included in the set of relevant documents have score equal to zero. Similarly, if the query returns a number N < at documents, we will assume that those N - at missing scores are equal to zero.
Parameters: - query_results – Raw query results returned by Vespa.
- relevant_docs – A list with dicts where each dict contains a doc id a optionally a doc score.
- id_field – The Vespa field representing the document id.
- default_score – Score to assign to the additional documents that are not relevant. Default to 0.
- detailed_metrics – Return intermediate computations if available.
Returns: Dict containing the ideal discounted cumulative gain (_ideal_dcg), the discounted cumulative gain (_dcg) and the normalized discounted cumulative gain.
-