Skip to content

brightdata/ai-data-enrichment-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Enrichment Agent with LangGraph + Bright Data

Build an AI-powered data enrichment agent that automatically researches and extracts structured data from the web.

What It Does

  1. Takes a research topic and a JSON schema as input
  2. Searches the web using Bright Data SERP API
  3. Scrapes websites using Bright Data Web Unlocker
  4. Uses an LLM to extract and structure the data
  5. Returns structured JSON matching your schema

Prerequisites

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Set environment variables

export BRIGHT_DATA_API_KEY="your-bright-data-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Or copy .env.example to .env and fill in your keys.

3. Run the agent

python enrichment_agent.py

Expected output:

{
  "company_name": "Stripe",
  "industry": "Financial Technology / Payments",
  "headquarters": "San Francisco, California",
  "founded": "2010",
  "key_products": [
    "Stripe Payments",
    "Stripe Billing",
    "Stripe Connect",
    "Stripe Atlas"
  ]
}

How It Works

┌─────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│  Input   │────▶│  Search  │────▶│  Scrape  │────▶│ Extract  │
│ (topic + │     │ (SERP    │     │ (Web     │     │ (LLM     │
│  schema) │     │  API)    │     │ Unlocker)│     │ output)  │
└─────────┘     └──────────┘     └──────────┘     └──────────┘

The agent uses a LangGraph loop: the LLM decides whether to search, scrape a page, or submit the final structured result.

Customization

Different schema

from enrichment_agent import enrich

schema = {
    "type": "object",
    "properties": {
        "competitors": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "market_position": {"type": "string"},
                    "key_differentiator": {"type": "string"}
                }
            }
        }
    }
}

result = await enrich("Stripe competitors in payment processing", schema)

Geo-targeted search

Modify the serp_tool in enrichment_agent.py:

serp_tool = BrightDataSERP(
    search_engine="google",
    country="de",      # Germany
    language="de",     # German
    results_count=10,
)

Use OpenAI instead of Anthropic

from langchain_openai import ChatOpenAI

# Replace the LLM initialization in create_agent()
llm = ChatOpenAI(model="gpt-4o")

Links

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages