Skip to main content

Overview

Brave AI Grounding API provides state-of-the-art AI-generated answers backed by verifiable sources from the web. This technology improves the accuracy, relevance, and trustworthiness of AI responses by grounding them in real-time search results. Under the hood, this same service powers Brave’s Answer with AI feature, which serves tens of millions of answers every day. Brave’s grounded answers demonstrate strong performance across a wide range of queries, from simple trivia questions to complex research inquiries. Notably, Brave achieves state-of-the-art (SOTA) performance on the SimpleQA benchmark without being specifically optimized for it—the performance emerges naturally from the system’s design.
Access to AI Grounding is available through the AI Grounding plan. Subscribe to AI Grounding to unlock these capabilities.

Key Features

Web-Grounded Answers

AI responses backed by real-time web search with verifiable citations

OpenAI SDK Compatible

Use the familiar OpenAI SDK for seamless integration

SOTA Performance

State-of-the-art results on SimpleQA benchmark

Streaming Support

Stream answers in real-time with progressive citations

Research Mode

Enable multi-search for thorough, research-grade answers

Rich Response Data

Get entities, citations, and structured data with answers

API Reference

AI Grounding API Documentation

View the complete API reference, including parameters and response schemas

Use Cases

AI Grounding is perfect for:
  • AI Assistants & Chatbots: Build intelligent conversational interfaces with factual, cited responses
  • Research Applications: Conduct thorough research with multi-search capabilities
  • Question Answering Systems: Provide accurate answers with source attribution
  • Knowledge Applications: Create tools that need up-to-date, verifiable information
  • Content Generation: Generate well-researched content with citations

Endpoint

AI Grounding uses a single, OpenAI-compatible endpoint:
https://api.search.brave.com/res/v1/chat/completions

Quick Start

Basic Example with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="<YOUR_BRAVE_SEARCH_API_KEY>",
    base_url="https://api.search.brave.com/res/v1",
)

completions = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What are the best things to do in Paris with kids?",
        }
    ],
    model="brave",
    stream=False,
)

print(completions.choices[0].message.content)

Streaming Example

For real-time responses, enable streaming with AsyncOpenAI:
from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI(
    api_key="<YOUR_BRAVE_SEARCH_API_KEY>",
    base_url="https://api.search.brave.com/res/v1",
)

async def stream_answer():
    async for chunk in await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Explain quantum computing",
            }
        ],
        model="brave",
        stream=True,
    ):
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(stream_answer())

Using curl

While the OpenAI SDK is recommended, you can also use curl:
curl -X POST -s --compressed "https://api.search.brave.com/res/v1/chat/completions" \
  -H "accept: application/json" \
  -H "Accept-Encoding: gzip" \
  -H "Content-Type: application/json" \
  -d '{"stream": false, "messages": [{"role": "user", "content": "What is the second highest mountain?"}]}' \
  -H "x-subscription-token: <YOUR_BRAVE_SEARCH_API_KEY>"

Single vs Multiple Searches

The decision between single-search and multi-search significantly influences both cost efficiency and response time.

Single Search (Default)

  • Speed: Answers typically stream in under 4.5 seconds on average
  • Cost: Lower cost with minimal computational overhead
  • Use Case: Ideal for real-time applications and most queries
  • Performance: Median SimpleQA benchmark question answered with single search

Multiple Searches (Research Mode)

  • Thoroughness: Model iteratively refines strategy with sequential searches
  • Cost: Higher due to multiple API calls and larger context processing
  • Time: Response times can extend to minutes
  • Use Case: Best for background tasks prioritizing thoroughness over speed
Enable research mode by adding enable_research: true:
completions = client.chat.completions.create(
    messages=[{"role": "user", "content": "History of quantum mechanics"}],
    model="brave",
    stream=False,
    extra_body={
        "enable_research": True,
    }
)
Performance note: On the SimpleQA benchmark, p99 questions required 53 queries analyzing 1000 pages over ~300 seconds. However, reasonable limits are in place based on real-world use cases.

Advanced Parameters

When using the OpenAI SDK, pass additional parameters via extra_body:
completions = client.chat.completions.create(
    messages=[{"role": "user", "content": "History of Rome"}],
    model="brave",
    stream=False,
    extra_body={
        "country": "IT",
        "language": "it",
        "enable_entities": True,
        "enable_citations": True,
        "enable_research": False,
    }
)

Available Parameters

  • country (string): Target country for search results (default: us)
  • language (string): Response language (default: en)
  • enable_entities (bool): Include entity information in responses (default: false)
  • enable_citations (bool): Include inline citations (default: false)
  • enable_research (bool): Enable multi-search research mode (default: false)

Response Format

Because AI Grounding uses custom messages with richer data than standard OpenAI responses, messages are stringified with special tags. When streaming, you’ll receive:

Standard Text

Regular answer content streamed as text.

Citations

<citation>{"start_index": 0, "end_index": 10, "number": 1, "url": "https://...", "favicon": "...", "snippet": "..."}</citation>

Entity Items

<enum_item>{"uuid": "...", "name": "...", "href": "...", "original_tokens": "...", "citations": [...]}</enum_item>

Usage Metadata

<usage>{ "X-Request-Requests": 1, "X-Request-Queries": 2, "X-Request-Tokens-In": 1234, "X-Request-Tokens-Out": 300, "X-Request-Requests-Cost": 0, "X-Request-Queries-Cost": 0.008, "X-Request-Tokens-In-Cost": 0.00617, "X-Request-Tokens-Out-Cost": 0.0015, "X-Request-Total-Cost": 0.01567 }</usage>

Complete Streaming Example

Here’s a full example that handles all message types:
#!/usr/bin/env python

import asyncio
import json
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="<YOUR_BRAVE_SEARCH_API_KEY>",
    base_url="https://api.search.brave.com/res/v1",
)

async def main():
    async for data in await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "albums from lady gaga",
            }
        ],
        model="brave",
        stream=True,
        extra_body={
            "country": "us",
            "language": "en",
            "enable_citations": True,
            "enable_research": False,
        },
    ):
        if choices := data.choices:
            if delta := choices[0].delta.content:
                if delta.startswith("<citation>") and delta.endswith("</citation>"):
                    # Parse citation
                    citation = json.loads(
                        delta.removeprefix("<citation>").removesuffix("</citation>")
                    )
                    print(f"[{citation['number']}]({citation['url']})", end="", flush=True)

                elif delta.startswith("<enum_item>") and delta.endswith("</enum_item>"):
                    # Parse entity item
                    item = json.loads(
                        delta.removeprefix("<enum_item>").removesuffix("</enum_item>")
                    )
                    print("*", item["original_tokens"], end="", flush=True)

                elif delta.startswith("<usage>") and delta.endswith("</usage>"):
                    # Parse usage metadata
                    usage = json.loads(
                        delta.removeprefix("<usage>").removesuffix("</usage>")
                    )
                    print("\n\nUsage:", usage)

                else:
                    # Regular text content
                    print(delta, end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())

Pricing & Spending Limits

AI Grounding uses a usage-based pricing model: Cost Calculation:
cost = (searches × $4/1000) + (input_tokens × $5/1000000) + (output_tokens × $5/1000000)
Example:
  • 2 searches
  • 1,234 input tokens
  • 300 output tokens
Cost = 2 × (4/1000) + (5/1000000) × 1234 + (5/1000000) × 300
     = $0.01567

Usage Metadata

With each answer, you’ll receive metadata on resource usage:
{
  "X-Request-Requests": 1,
  "X-Request-Queries": 2,
  "X-Request-Tokens-In": 1234,
  "X-Request-Tokens-Out": 300,
  "X-Request-Requests-Cost": 0,
  "X-Request-Queries-Cost": 0.008,
  "X-Request-Tokens-In-Cost": 0.00617,
  "X-Request-Tokens-Out-Cost": 0.0015,
  "X-Request-Total-Cost": 0.01567
}
When streaming, this metadata comes as the last message. For synchronous requests, the keys above are included in response headers.

Setting Limits

Control your spending by setting monthly credit limits in your account settings.
Limit behavior: Limits are checked before answering. If limits aren’t exceeded when a question starts, it will be answered in full even if it exceeds limits during processing. You’ll only be charged up to your imposed limit.

Rate Limits

Brave offers two complementary approaches for AI-powered search: When to use AI Grounding:
  • Building conversational AI applications
  • Need OpenAI SDK compatibility
  • Want simple, single-endpoint integration
  • Require research mode for thorough answers
When to use Summarizer Search:
  • Need access to underlying search results
  • Want to use specialized endpoints (title, enrichments, followups, etc.)
  • Building applications with custom search result processing
  • Prefer the traditional web search + summarization flow
Learn more about Summarizer Search.

Best Practices

Message Handling

  • Always handle special message tags (<citation>, <enum_item>, <usage>)
  • Parse JSON content within tags to extract structured data
  • Display citations inline for better user trust

Streaming

  • Use AsyncOpenAI for streaming responses
  • Display content progressively for better UX
  • Handle usage metadata at the end of the stream

Research Mode

  • Enable only when thoroughness is more important than speed
  • Best for background processing or complex research queries
  • Monitor usage as it can incur higher costs

Error Handling

  • Implement retry logic for transient failures
  • Check spending limits before critical operations
  • Handle rate limit errors gracefully

Performance

  • Use single-search mode (default) for most queries
  • Cache responses when appropriate to minimize API calls
  • Monitor usage metadata to optimize costs

Changelog

This changelog outlines all significant changes to the Brave AI Grounding API in chronological order.

2025-08-05

  • Launch Brave AI Grounding API resource
  • OpenAI SDK compatibility
  • Support for single and multi-search modes
  • SOTA performance on SimpleQA benchmark
I