Getting started with pyvespa

Vespa logo

This notebook starts Vespa, configures the application and tests the document and query APIs.

See pyvespa, jupyter notebooks and Docker requirements to run this notebook.

Create the application package

Create an application package:

[1]:
from typing import List

from vespa.package import (
    Document,
    Field,
    Schema,
    FieldSet,
    RankProfile,
    HNSW,
    ApplicationPackage,
    QueryProfile,
    QueryProfileType,
    QueryTypeField,
)

class QuestionAnswering(ApplicationPackage):
    def __init__(self, name: str = "qa"):
        context_document = Document(
            fields=[
                Field(
                    name="questions",
                    type="array<int>",
                    indexing=["summary", "attribute"],
                ),
                Field(name="dataset", type="string", indexing=["summary", "attribute"]),
                Field(name="context_id", type="int", indexing=["summary", "attribute"]),
                Field(
                    name="text",
                    type="string",
                    indexing=["summary", "index"],
                    index="enable-bm25",
                ),
            ]
        )
        context_schema = Schema(
            name="context",
            document=context_document,
            fieldsets=[FieldSet(name="default", fields=["text"])],
            rank_profiles=[
                RankProfile(name="bm25", inherits="default", first_phase="bm25(text)"),
                RankProfile(
                    name="nativeRank",
                    inherits="default",
                    first_phase="nativeRank(text)",
                ),
            ],
        )
        sentence_document = Document(
            inherits="context",
            fields=[
                Field(
                    name="sentence_embedding",
                    type="tensor<float>(x[512])",
                    indexing=["attribute", "index"],
                    ann=HNSW(
                        distance_metric="euclidean",
                        max_links_per_node=16,
                        neighbors_to_explore_at_insert=500,
                    ),
                )
            ],
        )
        sentence_schema = Schema(
            name="sentence",
            document=sentence_document,
            fieldsets=[FieldSet(name="default", fields=["text"])],
            rank_profiles=[
                RankProfile(
                    name="semantic-similarity",
                    inherits="default",
                    first_phase="closeness(sentence_embedding)",
                ),
                RankProfile(name="bm25", inherits="default", first_phase="bm25(text)"),
                RankProfile(
                    name="bm25-semantic-similarity",
                    inherits="default",
                    first_phase="bm25(text) + closeness(sentence_embedding)",
                ),
            ],
        )
        super().__init__(
            name=name,
            schema=[context_schema, sentence_schema],
            query_profile=QueryProfile(),
            query_profile_type=QueryProfileType(
                fields=[
                    QueryTypeField(
                        name="ranking.features.query(query_embedding)",
                        type="tensor<float>(x[512])",
                    )
                ]
            ),
        )

app_package = QuestionAnswering()

Deploy the application using Docker

Deploy the app_package, wait for Finished deployment:

[2]:
import os
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()
app = vespa_docker.deploy(application_package=app_package)
Waiting for configuration server, 0/300 seconds...
Waiting for configuration server, 5/300 seconds...
Waiting for application status, 0/300 seconds...
Waiting for application status, 5/300 seconds...
Waiting for application status, 10/300 seconds...
Waiting for application status, 15/300 seconds...
Waiting for application status, 20/300 seconds...
Waiting for application status, 25/300 seconds...
Waiting for application status, 30/300 seconds...
Waiting for application status, 35/300 seconds...
Finished deployment.

The above deploys the application package to a running Vespa instance. Before moving on, inspect the generated configuration that was deployed.

See deploy-docker for how to export the application package to files, and how to deploy after modifying the application package.

Download, prepare and feed sample data

[3]:
import json, requests

sentence_data = json.loads(
    requests.get("https://data.vespa.oath.cloud/blog/qa/sample_sentence_data_100.json").text
)
list(sentence_data[0].keys())
[3]:
['text', 'dataset', 'questions', 'context_id', 'sentence_embedding']

Prepare the data as a list of dicts having the id key holding a unique id of the data point and the fields key holding a dict with the data fields required by the application:

[4]:
batch_feed = [
    {
        "id": idx,
        "fields": sentence
    }
    for idx, sentence in enumerate(sentence_data)
]

Feed the batch using the sentence schema:

[5]:
response = app.feed_batch(schema="sentence", batch=batch_feed)
Successful documents fed: 100/100.
Batch progress: 1/1.

Run a query

Query the application using the Vespa Query Language:

[6]:
result = app.query(body={
  'yql': 'select text from sources sentence  where userQuery();',
  'query': 'What is in front of the Notre Dame Main Building?',
  'type': 'any',
  'hits': 5,
  'ranking.profile': 'bm25'
})
[7]:
result.hits[0]
[7]:
{'id': 'index:qa_content/0/a87ff679ab8603b42a4ffde2',
 'relevance': 11.194862200830393,
 'source': 'qa_content',
 'fields': {'text': 'Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes".'}}

Get documents

Get the sentences with ids = 0, 1 and 2. Inspect the response in json:

[8]:
batch = [{"id": 0}, {"id": 1}, {"id": 2}]
response = app.get_batch(schema="sentence", batch=batch)
[9]:
response
[9]:
[<vespa.io.VespaResponse at 0x10b0ee790>,
 <vespa.io.VespaResponse at 0x10b0edc10>,
 <vespa.io.VespaResponse at 0x11f0dd9a0>]
[10]:
response[0].json
[10]:
{'pathId': '/document/v1/sentence/sentence/docid/0',
 'id': 'id:sentence:sentence::0',
 'fields': {'text': "Atop the Main Building's gold dome is a golden statue of the Virgin Mary.",
  'dataset': 'squad',
  'sentence_embedding': {'type': 'tensor<float>(x[512])',
   'values': [-0.005731593817472458,
    0.007575507741421461,
    -0.06413306295871735,
    -0.007967847399413586,
    -0.06464996933937073,
    -0.07429644465446472,
    0.005069912411272526,
    -0.019518841058015823,
    -0.021434271708130836,
    -0.06423905491828918,
    0.0652240440249443,
    -0.06434165686368942,
    -0.06569897383451462,
    0.040481165051460266,
    0.024145686998963356,
    0.007362892851233482,
    0.07771800458431244,
    0.00946187973022461,
    -0.028259800747036934,
    0.005866243038326502,
    0.015300747938454151,
    0.021307284012436867,
    -0.05775361880660057,
    -0.017718791961669922,
    0.02191190794110298,
    0.006564970128238201,
    -0.005148318596184254,
    0.048069994896650314,
    -0.00011502391862450168,
    -0.06791272759437561,
    -0.009163320064544678,
    -0.07174629718065262,
    0.017720846459269524,
    0.037869423627853394,
    0.07788487523794174,
    -0.04319801554083824,
    0.013778245076537132,
    -0.05123303085565567,
    -0.07474122196435928,
    0.0666406899690628,
    -0.0534103699028492,
    -0.037872593849897385,
    0.010211014188826084,
    0.008907281793653965,
    -0.05693356692790985,
    0.06317473948001862,
    0.0785011276602745,
    0.026305610314011574,
    -0.04362558200955391,
    0.05106586217880249,
    0.06952179223299026,
    -0.029939323663711548,
    0.008242297917604446,
    -0.023817898705601692,
    0.04810205101966858,
    -0.023464491590857506,
    -0.02842424064874649,
    0.003361016744747758,
    0.0604083277285099,
    -0.00028857155120931566,
    -0.019782839342951775,
    0.04528861120343208,
    -0.052865512669086456,
    0.002612191950902343,
    -0.05548971891403198,
    0.07642041146755219,
    -0.0598042868077755,
    0.03133942186832428,
    -0.007824675180017948,
    0.06480591744184494,
    0.0025388130452483892,
    -0.07243897020816803,
    -0.04542206600308418,
    -0.007309886626899242,
    0.0840393528342247,
    0.052954018115997314,
    -0.04401383921504021,
    -0.06654049456119537,
    0.001990059856325388,
    -0.040212906897068024,
    -0.016776256263256073,
    0.013370862230658531,
    -0.008448323234915733,
    0.0728844702243805,
    0.024126939475536346,
    -0.06769422441720963,
    0.06302408874034882,
    0.07067032158374786,
    -0.038075320422649384,
    -0.015602963045239449,
    -0.027085624635219574,
    0.037075914442539215,
    0.005181755404919386,
    0.06082254648208618,
    0.032451849430799484,
    -0.04409795254468918,
    0.03051387146115303,
    -0.05744433403015137,
    -0.03471038118004799,
    0.03276345506310463,
    0.006043457426130772,
    -0.03554929047822952,
    -0.07440878450870514,
    0.02351159043610096,
    0.03136656433343887,
    0.07823910564184189,
    -0.02737349644303322,
    0.011265130713582039,
    0.03432106599211693,
    0.06546878069639206,
    -0.00023396419419441372,
    -0.011557986959815025,
    0.025620460510253906,
    0.05357709899544716,
    -0.0035238536074757576,
    0.03427300974726677,
    -0.04426465556025505,
    -0.027798393741250038,
    -0.038985710591077805,
    0.0077178506180644035,
    0.046469274908304214,
    -0.0731307864189148,
    -0.06964877992868423,
    0.01783963479101658,
    0.07705194503068924,
    0.0249634962528944,
    -0.043792326003313065,
    0.032148510217666626,
    -0.05472841486334801,
    -0.006402152124792337,
    -0.04752308502793312,
    -0.03934653848409653,
    -0.003358342917636037,
    -0.07451613992452621,
    -0.06583846360445023,
    0.014880238100886345,
    0.058178842067718506,
    -0.0744597315788269,
    -0.04388468712568283,
    -0.006177771836519241,
    0.06300053000450134,
    0.015160247683525085,
    0.08213217556476593,
    -0.028674177825450897,
    0.018802916631102562,
    -0.01958850957453251,
    -0.02518400177359581,
    -0.06280948221683502,
    -0.042627815157175064,
    -0.06520876288414001,
    -0.002140427241101861,
    0.028858941048383713,
    0.04875727742910385,
    -0.04436075687408447,
    -0.012736239470541477,
    -0.032670967280864716,
    0.048136305063962936,
    0.07212428748607635,
    -0.06159426271915436,
    0.026047125458717346,
    -0.00029080818057991564,
    0.011187436990439892,
    0.01240565162152052,
    0.048478174954652786,
    0.012715257704257965,
    0.011652180925011635,
    -0.0013308337656781077,
    0.032670535147190094,
    0.00945266243070364,
    0.014488788321614265,
    -0.029567314311861992,
    -0.032574769109487534,
    -0.021792365238070488,
    -0.020808113738894463,
    -0.03401615098118782,
    0.06680087745189667,
    -0.06250281631946564,
    0.03272148594260216,
    0.01936480775475502,
    -0.026815583929419518,
    0.07208762317895889,
    -0.03973960131406784,
    0.006415614392608404,
    0.0011102595599368215,
    0.05931733176112175,
    -0.07598836719989777,
    -0.026091668754816055,
    0.02341729961335659,
    -0.040755417197942734,
    0.001604373101145029,
    -0.0760861188173294,
    0.058562636375427246,
    -0.002422476653009653,
    0.022913040593266487,
    0.01795356161892414,
    -0.06944388151168823,
    -0.029202986508607864,
    -0.07684884965419769,
    0.06695760041475296,
    0.023546423763036728,
    0.07279899716377258,
    -0.043952248990535736,
    -0.0794493705034256,
    0.03422931954264641,
    0.06903161853551865,
    0.0034821887966245413,
    0.028148461133241653,
    0.021342268213629723,
    0.01619466207921505,
    0.037691012024879456,
    0.006760886870324612,
    0.013903879560530186,
    -0.02044975571334362,
    -0.04147280380129814,
    0.0027645062655210495,
    -0.015008649788796902,
    -0.01891951821744442,
    -0.03634532168507576,
    -0.05369746685028076,
    -0.009175799787044525,
    -0.0006909870426170528,
    -0.01112089492380619,
    0.01724017970263958,
    0.010581694543361664,
    -0.032602451741695404,
    -0.025953255593776703,
    0.020451324060559273,
    0.001531854155473411,
    -0.025993842631578445,
    -0.0008596995030529797,
    0.018400054425001144,
    0.025778459385037422,
    0.0212404727935791,
    0.06938228011131287,
    -0.042927153408527374,
    0.03705870732665062,
    0.03841257467865944,
    -0.024157313629984856,
    0.0269605815410614,
    0.0248009841889143,
    -0.03208054602146149,
    0.021425766870379448,
    -0.007572891656309366,
    0.017943117767572403,
    -0.03873080760240555,
    0.03102504275739193,
    0.04988511651754379,
    -0.0019366168417036533,
    0.04786434397101402,
    0.014679402112960815,
    -0.03031056746840477,
    -0.019375672563910484,
    -0.06264447420835495,
    0.06096327304840088,
    -0.04018431156873703,
    -0.06047431007027626,
    0.008236786350607872,
    -0.053653497248888016,
    0.07289472222328186,
    0.07068130373954773,
    -0.032801978290081024,
    -0.0745716392993927,
    -0.005813592113554478,
    0.05783914402127266,
    -0.03435840085148811,
    0.009527141228318214,
    0.018685584887862206,
    -0.06751752644777298,
    -0.06509213149547577,
    -0.08372518420219421,
    -0.05446317046880722,
    0.053907327353954315,
    -0.05474940314888954,
    0.014350458979606628,
    -0.07023079693317413,
    -0.011159663088619709,
    0.039929576218128204,
    -0.015988843515515327,
    0.004941158927977085,
    0.003037756308913231,
    -0.02879311889410019,
    0.01633497327566147,
    -0.041140444576740265,
    -0.07165240496397018,
    0.011640839278697968,
    -0.052849967032670975,
    0.009558329358696938,
    -0.04208458587527275,
    -0.06771799176931381,
    0.06408269703388214,
    -0.03123757801949978,
    -0.041630007326602936,
    -0.060997989028692245,
    -0.03564658388495445,
    -0.023803366348147392,
    -0.06091669574379921,
    -0.02365335449576378,
    -0.07259650528430939,
    -0.025628216564655304,
    0.04468527063727379,
    -0.04823016747832298,
    -0.03465230390429497,
    0.07370327413082123,
    -0.02722766250371933,
    -0.041227713227272034,
    -0.06710455566644669,
    -0.0018554985290393233,
    0.005912774242460728,
    0.02462003566324711,
    0.00448566023260355,
    -0.017967991530895233,
    -0.05529441684484482,
    -0.0009463604656048119,
    -0.006434826646000147,
    -0.048404306173324585,
    0.03753988444805145,
    0.06669825315475464,
    0.025847231969237328,
    -0.03900757059454918,
    0.06327608972787857,
    0.06903517246246338,
    0.02192314900457859,
    -0.008090319111943245,
    0.08822548389434814,
    0.03539515286684036,
    0.07631707191467285,
    0.002548432210460305,
    0.020233100280165672,
    -0.02968069165945053,
    -0.03596833720803261,
    0.04826529324054718,
    0.05092062056064606,
    0.05145549401640892,
    -0.05773278698325157,
    -0.05899902805685997,
    0.05344465747475624,
    0.04117647185921669,
    0.08456218242645264,
    -0.0005618633585982025,
    -0.05918756499886513,
    -0.014348267577588558,
    0.047307293862104416,
    -0.036367423832416534,
    0.07344646006822586,
    0.03181513398885727,
    0.08482533693313599,
    -0.021844688802957535,
    0.02116752788424492,
    0.01678370125591755,
    0.009503713808953762,
    0.05740010365843773,
    0.04721197858452797,
    -0.003934822976589203,
    -0.00770507100969553,
    -0.0020017882343381643,
    -0.0712333470582962,
    0.020015541464090347,
    0.07017334550619125,
    -0.07388505339622498,
    0.07016339898109436,
    -0.003999287728220224,
    -0.04195589944720268,
    -0.0726010650396347,
    -0.041846826672554016,
    -0.07102943956851959,
    0.06908860057592392,
    0.0383109450340271,
    0.05483092740178108,
    0.05271876975893974,
    0.02798583172261715,
    0.05721564218401909,
    -0.07676921039819717,
    -0.02675732411444187,
    -0.07017832249403,
    0.05919730290770531,
    0.035828378051519394,
    -0.05044249817728996,
    0.051987338811159134,
    -0.00882241502404213,
    0.01890091598033905,
    0.03677179664373398,
    -0.07210833579301834,
    -0.04706022888422012,
    0.006130976136773825,
    0.01891031488776207,
    -0.06158973649144173,
    0.05529525876045227,
    -0.0246284119784832,
    -0.03294168785214424,
    0.06761419028043747,
    -0.046971697360277176,
    0.06714886426925659,
    0.050442636013031006,
    0.03374863788485527,
    0.020866308361291885,
    -0.03665538132190704,
    0.06490837782621384,
    -0.012313500046730042,
    -0.05381961539387703,
    -0.012739498168230057,
    0.05014671012759209,
    -0.07939334213733673,
    -0.065677210688591,
    -0.02065417543053627,
    0.014188568107783794,
    0.025993280112743378,
    -0.015794135630130768,
    0.08905654400587082,
    -0.0700952485203743,
    0.006811636500060558,
    0.012133410200476646,
    -0.03085549734532833,
    -0.009217693470418453,
    -0.054441213607788086,
    -0.05287398770451546,
    -0.038465626537799835,
    0.023438384756445885,
    0.03366180509328842,
    0.013422048650681973,
    -0.01985223777592182,
    -0.014284738339483738,
    0.059922955930233,
    0.028297947719693184,
    -0.02536710724234581,
    -0.07693523168563843,
    -0.08256203681230545,
    0.024241937324404716,
    -0.07366729527711868,
    0.057095978409051895,
    0.05186446011066437,
    -0.013500767759978771,
    -0.05222296714782715,
    0.05155124515295029,
    0.0519084706902504,
    0.029741182923316956,
    0.014399838633835316,
    0.02402777597308159,
    0.03734354302287102,
    -0.0008057677769102156,
    -0.06729704886674881,
    0.05468644201755524,
    -0.04701060801744461,
    0.02575099840760231,
    -0.018379278481006622,
    0.024666225537657738,
    -0.06913184374570847,
    -0.003778835292905569,
    0.02831369824707508,
    0.0012387082679197192,
    0.012194967828691006,
    0.05198746919631958,
    -0.04047307372093201,
    0.06428617984056473,
    -0.06766947358846664,
    -0.0054612732492387295,
    0.007396876811981201,
    -0.05219375714659691,
    0.08037269860506058,
    -0.024306727573275566,
    -0.03241264447569847,
    0.05088223144412041,
    -0.010638539679348469,
    -0.027046995237469673,
    -0.03496580943465233,
    0.05657656863331795,
    0.03490531072020531,
    0.043829623609781265,
    0.0055516925640404224,
    0.06429943442344666,
    0.01735875941812992,
    -0.0634138211607933,
    0.0782628059387207,
    0.04917420819401741,
    0.019163435325026512,
    0.03917761519551277,
    -0.04597480595111847,
    -0.054439157247543335,
    0.004626117646694183,
    -0.02272822894155979,
    -0.06397286802530289,
    0.03678639233112335,
    0.004487334750592709,
    0.030818555504083633,
    0.051590774208307266,
    -0.05228615179657936,
    -0.03993840888142586,
    0.004463767632842064,
    0.04185232147574425,
    -0.02518554776906967,
    -0.01816924475133419,
    0.03335050866007805,
    0.00811048038303852,
    0.034591205418109894,
    -0.08239106088876724,
    -0.016733549535274506,
    0.07935647666454315,
    0.03704648092389107,
    0.013241861946880817,
    -0.007764177396893501,
    0.04041573405265808,
    0.007542093750089407,
    0.0336596816778183,
    -0.03812943026423454,
    -0.0482783168554306,
    0.08552124351263046,
    0.05144829675555229,
    -0.04466550424695015,
    0.05642062425613403,
    0.06042104959487915,
    -0.01285067480057478,
    -0.047095801681280136,
    -0.019968846812844276]},
  'context_id': 0,
  'questions': [4]}}

Update a document

Update a data point by id. Optionally, create the data point if it does not exist:

[11]:
batch_update = [
    {
        "id": 0,                               # data_id
        "fields": {"text": "this is a test"},  # fields to be updated
        "create": False                        # Optional. Create data point if not exist, default to False.

    }
]
[12]:
response = app.update_batch(schema="sentence", batch=batch_update)

Delete documents

Delete the sentences with ids = 0, 1 and 2:

[13]:
batch = [{"id": 0}, {"id": 1}, {"id": 2}]
response = app.delete_batch(schema="sentence", batch=batch)

Cleanup

[14]:
vespa_docker.container.stop(timeout=600)
vespa_docker.container.remove()