API Reference#

Documentation for all the main modules and functions of the dbutils_batch_query package.

The dbutils_batch_query package provides a unified interface for batch processing of model queries and prompt management. This documentation offers an overview of the core components of the API and provides links to detailed module documentation and key functions.

Modules Overview#

Model Query: The model_query module contains functions to handle model queries in batch. Key functions include:
- with_default_return Handles setting or retrieving default return values.
- extract_json_items Extracts JSON items from query responses.
- batch_model_query Executes and manages batch queries.
Prompts: The prompts module manages prompt rendering and executions. For example:
- load_prompt Loads a single prompt with jinja rendering.
File Utilities: The file_utils module provides utilities to interact with Databricks file systems. Key functions include:
- download_from_databricks Downloads files or directories from Databricks volumes.
- upload_to_databricks Uploads files or directories to Databricks volumes.
- delete_from_databricks Deletes files or directories from Databricks volumes.

For more detailed insights into each module, please see:

Quick Start Examples#

The batch_model_query function allows you to send multiple prompts to a specified model asynchronously.

Running in a Notebook#

from dbutils_batch_query.model_query import batch_model_query, extract_json_items
from dbutils_batch_query.prompts import load_prompt

user_prompt = load_prompt(
    "path/to/your/prompt_template.md", input_text="Some text to analyze."
)
# Example prompt information (replace with your actual prompts)
prompt_info = [
    {
        "system": "You are an assistant that extracts key information. Respond in a JSON codeblock.",
        "user": user_prompt,
        "id": "query_1",  # Optional: Add identifiers or other metadata
    },
    {
        "system": "You are an assistant that summarizes text. Respond in a JSON codeblock.",
        "user": load_prompt(
            "path/to/your/summary_template.md",
            document="Another document to summarize.",
        ),
        "id": "query_2",
    },
]

results = await batch_model_query(
    prompt_info=prompt_info,
    model="databricks-llama-4-maverick",  # Specify your Databricks model endpoint
    process_func=extract_json_items,  # Optional: function to process raw text response
    batch_size=5, # Optional: Batch size before optional save
    max_concurrent_requests=3, # Optional: Max concurrent requests
    rate_limit=(2, 1), # Optional: Number of requests per second
    results_path="output_results/",  # Optional: path to save results
    run_name="my_batch_run",  # Optional: identifier for the run
    # token and host are automatically fetched from environment or dbutils if not provided
)

# Process results
for result in results:
    if result["error"]:
        print(f"Error processing prompt {result.get('id', 'N/A')}: {result['error']}")
    else:
        print(f"Result for prompt {result.get('id', 'N/A')}:")
        # Access raw message or processed response
        # print(result["message"])
        print(result["processed_response"])

Running in a Python File#

import asyncio
from dbutils_batch_query.model_query import (
    batch_model_query,
    extract_json_items
)
from dbutils_batch_query.prompts import load_prompt

user_prompt = load_prompt(
    "path/to/your/prompt_template.md", input_text="Some text to analyze."
)
# Example prompt information (replace with your actual prompts)
prompt_info = [
    {
        "system": "You are an assistant that extracts key information. Respond in a JSON codeblock.",
        "user": user_prompt,
        "id": "query_1",  # Optional: Add identifiers or other metadata
    },
    {
        "system": "You are an assistant that summarizes text. Respond in a JSON codeblock.",
        "user": load_prompt(
            "path/to/your/summary_template.md",
            document="Another document to summarize.",
        ),
        "id": "query_2",
    },
]

results = asyncio.run(
    batch_model_query(
        prompt_info=prompt_info,
        model="databricks-llama-4-maverick",  # Specify your Databricks model endpoint
        process_func=extract_json_items,  # Optional: function to process raw text response
        batch_size=5, # Optional: Batch size before optional save
        max_concurrent_requests=3, # Optional: Max concurrent requests
        rate_limit=(2, 1), # Optional: Number of requests per second
        results_path="output_results/",  # Optional: path to save results
        run_name="my_batch_run",  # Optional: identifier for the run
        # token and host are automatically fetched from environment or dbutils if not provided
    )
)

# Process results
for result in results:
    if result["error"]:
        print(f"Error processing prompt {result.get('id', 'N/A')}: {result['error']}")
    else:
        print(f"Result for prompt {result.get('id', 'N/A')}:")
        # Access raw message or processed response
        # print(result["message"])
        print(result["processed_response"])

Note

Ensure that your environment is correctly configured before executing batch queries. Refer to each module’s documentation for detailed parameter descriptions and further examples.