AI Grounding

Overview

Brave AI Grounding API provides state-of-the-art AI-generated answers backed by verifiable sources from the web. This technology improves the accuracy, relevance, and trustworthiness of AI responses by grounding them in real-time search results. Under the hood, this same service powers Brave’s Answer with AI feature, which serves tens of millions of answers every day. Brave’s grounded answers demonstrate strong performance across a wide range of queries, from simple trivia questions to complex research inquiries. Notably, Brave achieves state-of-the-art (SOTA) performance on the SimpleQA benchmark without being specifically optimized for it—the performance emerges naturally from the system’s design.

Access to AI Grounding is available through the AI Grounding plan. Subscribe to AI Grounding to unlock these capabilities.

Key Features

Web-Grounded Answers

AI responses backed by real-time web search with verifiable citations

OpenAI SDK Compatible

Use the familiar OpenAI SDK for seamless integration

SOTA Performance

State-of-the-art results on SimpleQA benchmark

Streaming Support

Stream answers in real-time with progressive citations

Research Mode

Enable multi-search for thorough, research-grade answers

Rich Response Data

Get entities, citations, and structured data with answers

API Reference

AI Grounding API Documentation

View the complete API reference, including parameters and response schemas

Use Cases

AI Grounding is perfect for:

AI Assistants & Chatbots: Build intelligent conversational interfaces with factual, cited responses
Research Applications: Conduct thorough research with multi-search capabilities
Question Answering Systems: Provide accurate answers with source attribution
Knowledge Applications: Create tools that need up-to-date, verifiable information
Content Generation: Generate well-researched content with citations

Endpoint

AI Grounding uses a single, OpenAI-compatible endpoint:

https://api.search.brave.com/res/v1/chat/completions

Quick Start

Basic Example with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="<YOUR_BRAVE_SEARCH_API_KEY>",
    base_url="https://api.search.brave.com/res/v1",
)

completions = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What are the best things to do in Paris with kids?",
        }
    ],
    model="brave",
    stream=False,
)

print(completions.choices[0].message.content)

Streaming Example

For real-time responses, enable streaming with AsyncOpenAI:

from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI(
    api_key="<YOUR_BRAVE_SEARCH_API_KEY>",
    base_url="https://api.search.brave.com/res/v1",
)

async def stream_answer():
    async for chunk in await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Explain quantum computing",
            }
        ],
        model="brave",
        stream=True,
    ):
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(stream_answer())

Using curl

While the OpenAI SDK is recommended, you can also use curl:

curl -X POST -s --compressed "https://api.search.brave.com/res/v1/chat/completions" \
  -H "accept: application/json" \
  -H "Accept-Encoding: gzip" \
  -H "Content-Type: application/json" \
  -d '{"stream": false, "messages": [{"role": "user", "content": "What is the second highest mountain?"}]}' \
  -H "x-subscription-token: <YOUR_BRAVE_SEARCH_API_KEY>"

Single vs Multiple Searches

The decision between single-search and multi-search significantly influences both cost efficiency and response time.

Single Search (Default)

Speed: Answers typically stream in under 4.5 seconds on average
Cost: Lower cost with minimal computational overhead
Use Case: Ideal for real-time applications and most queries
Performance: Median SimpleQA benchmark question answered with single search

Multiple Searches (Research Mode)

Thoroughness: Model iteratively refines strategy with sequential searches
Cost: Higher due to multiple API calls and larger context processing
Time: Response times can extend to minutes
Use Case: Best for background tasks prioritizing thoroughness over speed

Enable research mode by adding enable_research: true:

completions = client.chat.completions.create(
    messages=[{"role": "user", "content": "History of quantum mechanics"}],
    model="brave",
    stream=False,
    extra_body={
        "enable_research": True,
    }
)

Performance note: On the SimpleQA benchmark, p99 questions required 53 queries analyzing 1000 pages over ~300 seconds. However, reasonable limits are in place based on real-world use cases.

Advanced Parameters

When using the OpenAI SDK, pass additional parameters via extra_body:

completions = client.chat.completions.create(
    messages=[{"role": "user", "content": "History of Rome"}],
    model="brave",
    stream=False,
    extra_body={
        "country": "IT",
        "language": "it",
        "enable_entities": True,
        "enable_citations": True,
        "enable_research": False,
    }
)

Available Parameters

country (string): Target country for search results (default: us)
language (string): Response language (default: en)
enable_entities (bool): Include entity information in responses (default: false)
enable_citations (bool): Include inline citations (default: false)
enable_research (bool): Enable multi-search research mode (default: false)

Response Format

Because AI Grounding uses custom messages with richer data than standard OpenAI responses, messages are stringified with special tags. When streaming, you’ll receive:

Standard Text

Regular answer content streamed as text.

Citations

<citation>{"start_index": 0, "end_index": 10, "number": 1, "url": "https://...", "favicon": "...", "snippet": "..."}</citation>

Entity Items

<enum_item>{"uuid": "...", "name": "...", "href": "...", "original_tokens": "...", "citations": [...]}</enum_item>

Usage Metadata

<usage>{ "X-Request-Requests": 1, "X-Request-Queries": 2, "X-Request-Tokens-In": 1234, "X-Request-Tokens-Out": 300, "X-Request-Requests-Cost": 0, "X-Request-Queries-Cost": 0.008, "X-Request-Tokens-In-Cost": 0.00617, "X-Request-Tokens-Out-Cost": 0.0015, "X-Request-Total-Cost": 0.01567 }</usage>

Complete Streaming Example

Here’s a full example that handles all message types:

#!/usr/bin/env python

import asyncio
import json
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="<YOUR_BRAVE_SEARCH_API_KEY>",
    base_url="https://api.search.brave.com/res/v1",
)

async def main():
    async for data in await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "albums from lady gaga",
            }
        ],
        model="brave",
        stream=True,
        extra_body={
            "country": "us",
            "language": "en",
            "enable_citations": True,
            "enable_research": False,
        },
    ):
        if choices := data.choices:
            if delta := choices[0].delta.content:
                if delta.startswith("<citation>") and delta.endswith("</citation>"):
                    # Parse citation
                    citation = json.loads(
                        delta.removeprefix("<citation>").removesuffix("</citation>")
                    )
                    print(f"[{citation['number']}]({citation['url']})", end="", flush=True)

                elif delta.startswith("<enum_item>") and delta.endswith("</enum_item>"):
                    # Parse entity item
                    item = json.loads(
                        delta.removeprefix("<enum_item>").removesuffix("</enum_item>")
                    )
                    print("*", item["original_tokens"], end="", flush=True)

                elif delta.startswith("<usage>") and delta.endswith("</usage>"):
                    # Parse usage metadata
                    usage = json.loads(
                        delta.removeprefix("<usage>").removesuffix("</usage>")
                    )
                    print("\n\nUsage:", usage)

                else:
                    # Regular text content
                    print(delta, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())

Pricing & Spending Limits

AI Grounding uses a usage-based pricing model: Cost Calculation:

cost = (searches × $4/1000) + (input_tokens × $5/1000000) + (output_tokens × $5/1000000)

Example:

2 searches
1,234 input tokens
300 output tokens

Cost = 2 × (4/1000) + (5/1000000) × 1234 + (5/1000000) × 300
     = $0.01567

Usage Metadata

With each answer, you’ll receive metadata on resource usage:

{
  "X-Request-Requests": 1,
  "X-Request-Queries": 2,
  "X-Request-Tokens-In": 1234,
  "X-Request-Tokens-Out": 300,
  "X-Request-Requests-Cost": 0,
  "X-Request-Queries-Cost": 0.008,
  "X-Request-Tokens-In-Cost": 0.00617,
  "X-Request-Tokens-Out-Cost": 0.0015,
  "X-Request-Total-Cost": 0.01567
}

When streaming, this metadata comes as the last message. For synchronous requests, the keys above are included in response headers.

Setting Limits

Control your spending by setting monthly credit limits in your account settings.

Limit behavior: Limits are checked before answering. If limits aren’t exceeded when a question starts, it will be answered in full even if it exceeds limits during processing. You’ll only be charged up to your imposed limit.

Rate Limits

Default: 2 requests per second
Need more? Contact searchapi-support@brave.com

AI Grounding vs Summarizer Search

Brave offers two complementary approaches for AI-powered search:

Direct AI answers using OpenAI-compatible endpoint. Best for building chat interfaces and applications that need instant, grounded AI responses.

Summarizer Search

Two-step workflow that first retrieves search results, then generates summaries. Best when you need control over search results or want to use specialized summarizer endpoints.

When to use AI Grounding:

Building conversational AI applications
Need OpenAI SDK compatibility
Want simple, single-endpoint integration
Require research mode for thorough answers

When to use Summarizer Search:

Need access to underlying search results
Want to use specialized endpoints (title, enrichments, followups, etc.)
Building applications with custom search result processing
Prefer the traditional web search + summarization flow

Learn more about Summarizer Search.

Best Practices

Message Handling

Always handle special message tags (<citation>, <enum_item>, <usage>)
Parse JSON content within tags to extract structured data
Display citations inline for better user trust

Streaming

Use AsyncOpenAI for streaming responses
Display content progressively for better UX
Handle usage metadata at the end of the stream

Research Mode

Enable only when thoroughness is more important than speed
Best for background processing or complex research queries
Monitor usage as it can incur higher costs

Error Handling

Implement retry logic for transient failures
Check spending limits before critical operations
Handle rate limit errors gracefully

Performance

Use single-search mode (default) for most queries
Cache responses when appropriate to minimize API calls
Monitor usage metadata to optimize costs

Changelog

This changelog outlines all significant changes to the Brave AI Grounding API in chronological order.

2025-08-05

Launch Brave AI Grounding API resource
OpenAI SDK compatibility
Support for single and multi-search modes
SOTA performance on SimpleQA benchmark

Getting started

Basics

Service APIs

Resources

​Overview

​Key Features

Web-Grounded Answers

OpenAI SDK Compatible

SOTA Performance

Streaming Support

Research Mode

Rich Response Data

​API Reference

AI Grounding API Documentation

​Use Cases

​Endpoint

​Quick Start

​Basic Example with OpenAI SDK

​Streaming Example

​Using curl

​Single vs Multiple Searches

​Single Search (Default)

​Multiple Searches (Research Mode)

​Advanced Parameters

​Available Parameters

​Response Format

​Standard Text

​Citations

​Entity Items

​Usage Metadata

​Complete Streaming Example

​Pricing & Spending Limits

​Usage Metadata

​Setting Limits

​Rate Limits

​AI Grounding vs Summarizer Search

AI Grounding

Summarizer Search

​Best Practices

​Message Handling

​Streaming

​Research Mode

​Error Handling

​Performance

​Changelog

​2025-08-05

Overview

Key Features

API Reference

Use Cases

Endpoint

Quick Start

Basic Example with OpenAI SDK

Streaming Example

Using curl

Single vs Multiple Searches

Single Search (Default)

Multiple Searches (Research Mode)

Advanced Parameters

Available Parameters

Response Format

Standard Text

Citations

Entity Items

Usage Metadata

Complete Streaming Example

Pricing & Spending Limits

Usage Metadata

Setting Limits

Rate Limits

AI Grounding vs Summarizer Search

Best Practices

Message Handling

Streaming

Research Mode

Error Handling

Performance

Changelog

2025-08-05