Build an AI-powered data enrichment agent that automatically researches and extracts structured data from the web.
- Takes a research topic and a JSON schema as input
- Searches the web using Bright Data SERP API
- Scrapes websites using Bright Data Web Unlocker
- Uses an LLM to extract and structure the data
- Returns structured JSON matching your schema
- Python 3.10+
- Bright Data account with API key
- Anthropic API key
pip install -r requirements.txtexport BRIGHT_DATA_API_KEY="your-bright-data-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"Or copy .env.example to .env and fill in your keys.
python enrichment_agent.pyExpected output:
{
"company_name": "Stripe",
"industry": "Financial Technology / Payments",
"headquarters": "San Francisco, California",
"founded": "2010",
"key_products": [
"Stripe Payments",
"Stripe Billing",
"Stripe Connect",
"Stripe Atlas"
]
}┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Input │────▶│ Search │────▶│ Scrape │────▶│ Extract │
│ (topic + │ │ (SERP │ │ (Web │ │ (LLM │
│ schema) │ │ API) │ │ Unlocker)│ │ output) │
└─────────┘ └──────────┘ └──────────┘ └──────────┘
The agent uses a LangGraph loop: the LLM decides whether to search, scrape a page, or submit the final structured result.
from enrichment_agent import enrich
schema = {
"type": "object",
"properties": {
"competitors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"market_position": {"type": "string"},
"key_differentiator": {"type": "string"}
}
}
}
}
}
result = await enrich("Stripe competitors in payment processing", schema)Modify the serp_tool in enrichment_agent.py:
serp_tool = BrightDataSERP(
search_engine="google",
country="de", # Germany
language="de", # German
results_count=10,
)from langchain_openai import ChatOpenAI
# Replace the LLM initialization in create_agent()
llm = ChatOpenAI(model="gpt-4o")MIT