Skip to content

Query Modes

HetaDB exposes five retrieval strategies via the query_mode field of POST /api/v1/hetadb/chat. This page describes each mode, when to use it, and provides a ready-to-run curl example.


Overview

query_mode Strategy Typical latency
naive Parallel vector + KG retrieval, weighted scoring Fastest
rerank BM25 + vector RRF fusion → cross-encoder rerank Medium
rewriter LLM generates 3 query variants, parallel retrieval Medium–slow
multihop ReAct reasoning loop (max 3 rounds) Slowest
direct LLM only — no retrieval Very fast

naive

Strategy: Runs vector retrieval and knowledge-graph retrieval in parallel, then combines results with weighted scoring. No re-ranking step.

When to use: The default choice for most queries. Use it when you want fast responses and the question is clearly stated.

curl -X POST http://localhost:8000/api/v1/hetadb/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query":      "What is the abstract of this paper?",
    "kb_id":      "research_kb",
    "user_id":    "agent",
    "query_mode": "naive"
  }'

rerank

Strategy: Combines BM25 keyword retrieval and dense vector retrieval using Reciprocal Rank Fusion (RRF), then re-ranks the fused candidates with a cross-encoder model. Produces the highest-precision results.

When to use: Factual questions, technical queries, or any case where citation accuracy matters most.

curl -X POST http://localhost:8000/api/v1/hetadb/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query":      "What loss function does the model use?",
    "kb_id":      "research_kb",
    "user_id":    "agent",
    "query_mode": "rerank"
  }'

Tip

rerank is the recommended mode for production agents that need reliable citations. The extra latency is usually worth the precision gain. Cross-encoder reranking requires hetadb.query_defaults.reranker_url to be configured; without it the mode falls back to pure RRF ordering — still better than naive but without cross-encoder scoring.


rewriter

Strategy: An LLM generates three paraphrased variants of the original query. Each variant triggers independent retrieval; results are merged and deduplicated before synthesis.

When to use: Ambiguous or under-specified queries where a single phrasing may miss relevant chunks (e.g., jargon-heavy or informal user input).

curl -X POST http://localhost:8000/api/v1/hetadb/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query":      "how does the thing handle sequences",
    "kb_id":      "research_kb",
    "user_id":    "agent",
    "query_mode": "rewriter"
  }'

multihop

Strategy: Implements a ReAct (Reason + Act) loop that iterates up to three retrieval rounds. After each round the LLM decides whether the accumulated context is sufficient to answer or whether another retrieval step is needed.

When to use: Complex questions that require chaining multiple facts together (e.g., "Compare the approaches used in sections 3 and 5 and explain which performs better on dataset X").

curl -X POST http://localhost:8000/api/v1/hetadb/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query":      "How does the proposed method compare to the baseline across all three datasets?",
    "kb_id":      "research_kb",
    "user_id":    "agent",
    "query_mode": "multihop"
  }'

Warning

multihop is the slowest mode. Avoid it for simple factual lookups where rerank or naive will suffice.


direct

Strategy: The LLM answers entirely from its parametric (pre-trained) knowledge. No retrieval is performed. The data[] and citations[] fields in the response will always be empty.

When to use: Quick LLM opinions, general knowledge questions unrelated to your documents, or when you want to bypass the retrieval stack entirely for testing purposes.

curl -X POST http://localhost:8000/api/v1/hetadb/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query":      "What is the capital of France?",
    "kb_id":      "research_kb",
    "user_id":    "agent",
    "query_mode": "direct"
  }'

Warning

direct mode does not read your knowledge base at all. Any facts specific to your uploaded documents will not appear in the answer.