API Reference#
Documentation for all the main modules and functions of the dbutils_batch_query package.
The dbutils_batch_query
package provides a unified interface for batch processing of model queries and prompt management. This documentation offers an overview of the core components of the API and provides links to detailed module documentation and key functions.
Modules Overview#
Model Query: The
model_query
module contains functions to handle model queries in batch. Key functions include:with_default_return
Handles setting or retrieving default return values.extract_json_items
Extracts JSON items from query responses.batch_model_query
Executes and manages batch queries.
Prompts: The
prompts
module manages prompt rendering and executions. For example:load_prompt
Loads a single prompt with jinja rendering.
File Utilities: The
file_utils
module provides utilities to interact with Databricks file systems. Key functions include:download_from_databricks
Downloads files or directories from Databricks volumes.upload_to_databricks
Uploads files or directories to Databricks volumes.delete_from_databricks
Deletes files or directories from Databricks volumes.
- For more detailed insights into each module, please see:
Quick Start Examples#
The
batch_model_query
function allows you to send multiple prompts to a specified model asynchronously.Running in a Notebook#
from dbutils_batch_query.model_query import batch_model_query, extract_json_items from dbutils_batch_query.prompts import load_prompt user_prompt = load_prompt( "path/to/your/prompt_template.md", input_text="Some text to analyze." ) # Example prompt information (replace with your actual prompts) prompt_info = [ { "system": "You are an assistant that extracts key information. Respond in a JSON codeblock.", "user": user_prompt, "id": "query_1", # Optional: Add identifiers or other metadata }, { "system": "You are an assistant that summarizes text. Respond in a JSON codeblock.", "user": load_prompt( "path/to/your/summary_template.md", document="Another document to summarize.", ), "id": "query_2", }, ] results = await batch_model_query( prompt_info=prompt_info, model="databricks-llama-4-maverick", # Specify your Databricks model endpoint process_func=extract_json_items, # Optional: function to process raw text response batch_size=5, # Optional: Batch size before optional save max_concurrent_requests=3, # Optional: Max concurrent requests rate_limit=(2, 1), # Optional: Number of requests per second results_path="output_results/", # Optional: path to save results run_name="my_batch_run", # Optional: identifier for the run # token and host are automatically fetched from environment or dbutils if not provided ) # Process results for result in results: if result["error"]: print(f"Error processing prompt {result.get('id', 'N/A')}: {result['error']}") else: print(f"Result for prompt {result.get('id', 'N/A')}:") # Access raw message or processed response # print(result["message"]) print(result["processed_response"])Running in a Python File#
import asyncio from dbutils_batch_query.model_query import ( batch_model_query, extract_json_items ) from dbutils_batch_query.prompts import load_prompt user_prompt = load_prompt( "path/to/your/prompt_template.md", input_text="Some text to analyze." ) # Example prompt information (replace with your actual prompts) prompt_info = [ { "system": "You are an assistant that extracts key information. Respond in a JSON codeblock.", "user": user_prompt, "id": "query_1", # Optional: Add identifiers or other metadata }, { "system": "You are an assistant that summarizes text. Respond in a JSON codeblock.", "user": load_prompt( "path/to/your/summary_template.md", document="Another document to summarize.", ), "id": "query_2", }, ] results = asyncio.run( batch_model_query( prompt_info=prompt_info, model="databricks-llama-4-maverick", # Specify your Databricks model endpoint process_func=extract_json_items, # Optional: function to process raw text response batch_size=5, # Optional: Batch size before optional save max_concurrent_requests=3, # Optional: Max concurrent requests rate_limit=(2, 1), # Optional: Number of requests per second results_path="output_results/", # Optional: path to save results run_name="my_batch_run", # Optional: identifier for the run # token and host are automatically fetched from environment or dbutils if not provided ) ) # Process results for result in results: if result["error"]: print(f"Error processing prompt {result.get('id', 'N/A')}: {result['error']}") else: print(f"Result for prompt {result.get('id', 'N/A')}:") # Access raw message or processed response # print(result["message"]) print(result["processed_response"])
Note
Ensure that your environment is correctly configured before executing batch queries. Refer to each module’s documentation for detailed parameter descriptions and further examples.