Skip to content

akm-esco/Agent-Builder-FinDemo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Wealth Management Data Platform - Elasticsearch Integration

License: MIT Python 3.8+ Elasticsearch 8.x PRs Welcome

This project generates comprehensive synthetic wealth management data and uploads it to Elasticsearch for analysis, searching, and visualization. It includes three interconnected datasets that simulate a complete wealth management platform.

⚠️ Important: All data generated by this project is 100% synthetic and for demonstration purposes only. No real personal or financial information is included.

Overview

The platform includes three core data sources:

  1. Account Information - Customer profiles with portfolio details
  2. Trade Data - Historical trading activity and transactions
  3. Wealth Management Emails - Client-advisor communication threads

All data is linked via account_id for comprehensive cross-dataset analysis.

Files

Generation Scripts

  • generate_account_info.py - Generates customer account profiles
  • generate_trade_data.py - Generates trade transactions
  • generate_10k_emails.py - Generates email communication threads

Upload Scripts

  • upload_accounts.py - Uploads account data to Elasticsearch
  • upload_trades.py - Uploads trade data to Elasticsearch
  • upload_to_elasticsearch.py - Uploads email threads to Elasticsearch

Data Files

  • account_information.csv - Generated customer accounts (~7,000 accounts)
  • trade_data.csv - Generated trade transactions (~269,000 trades)
  • wealth_management_emails.csv - Generated email threads (5,000 threads)

Configuration & Documentation

  • requirements.txt - Python dependencies
  • QUICKSTART.md - Quick start guide
  • ELASTIC_CLOUD_SETUP.md - Detailed Elastic Cloud setup
  • DATA_OVERVIEW.md - Comprehensive data schema reference

Setup

1. Clone the Repository

git clone https://github.com/YOUR-USERNAME/Finance_Demo.git
cd Finance_Demo

2. Install Dependencies

pip install -r requirements.txt

3. Configure Elasticsearch Credentials

Important: Never commit real credentials to the repository!

# Copy the example configuration
cp config_example.py config.py

# Edit config.py with your actual Elasticsearch credentials
# config.py is in .gitignore and won't be committed

Edit config.py and update:

  • ES_ENDPOINT_URL - Your Elasticsearch endpoint
  • ES_API_KEY - Your API key (recommended) or username/password

Alternative: Use environment variables:

export ES_ENDPOINT_URL="https://your-deployment:443"
export ES_API_KEY="your-api-key"

4. Choose Your Elasticsearch Deployment

Option A: Elastic Cloud (Recommended for Production)

Quick Start:

  1. Sign up at https://cloud.elastic.co (14-day free trial)
  2. Create a deployment
  3. Copy your Cloud ID and create an API key
  4. Follow the detailed guide: ELASTIC_CLOUD_SETUP.md

Benefits:

  • Fully managed, no infrastructure to maintain
  • Automatic backups and updates
  • Built-in security and monitoring
  • Kibana included
  • Free trial available

Option B: Local Elasticsearch (For Development)

5. Start Elasticsearch (if running locally)

Option A: Using Docker

docker run -d --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.11.0

Option B: Using Homebrew (macOS)

brew install elasticsearch
brew services start elasticsearch

Option C: Download and Run

Note: With config.py set up (Step 3), the upload scripts will automatically use your credentials. No manual script editing needed!

Usage

Quick Start - Generate and Upload All Data

To generate and upload all three datasets at once:

# 1. Generate all data (run in sequence, accounts must be generated first)
python3 generate_account_info.py    # ~7,000 customer accounts
python3 generate_trade_data.py      # ~269,000 trades
python3 generate_10k_emails.py      # 5,000 email threads

# 2. Upload all data to Elasticsearch
python3 upload_accounts.py          # Upload accounts
python3 upload_trades.py            # Upload trades  
python3 upload_to_elasticsearch.py  # Upload emails

Dataset 1: Account Information

Generate Account Data:

python3 generate_account_info.py

This creates account_information.csv with:

  • ~7,000 customer account profiles
  • Complete demographics (name, age, address, contact info)
  • Account details (type, status, opened date, risk profile)
  • Portfolio metrics (value, cash balance, unrealized gains, YTD returns)
  • Financial information (income, net worth, employment status)
  • Compliance fields (KYC status, tax ID verification, accredited investor)

Upload to Elasticsearch:

python3 upload_accounts.py

Creates index: account-information with semantic search on investment objectives.

Dataset 2: Trade Data

Generate Trade Data:

python3 generate_trade_data.py

This creates trade_data.csv with:

  • ~269,000 trade transactions across all accounts
  • Buy/sell transactions for stocks, ETFs, bonds
  • Realistic pricing, quantities, and commissions
  • Trade execution details (date, time, exchange, status)
  • Settlement dates and total amounts

Upload to Elasticsearch:

python3 upload_trades.py

Creates index: trade-data with optimized mappings for trade analysis.

Dataset 3: Wealth Management Emails

Generate Email Data:

Basic generation (core fields only - no enrichment):

python3 generate_10k_emails.py

With enrichment fields (for trend analysis):

python3 generate_10k_emails.py --include-enrichment

This creates wealth_management_emails.csv with:

  • 5,000 email thread examples
  • Account IDs linked to generated accounts
  • Varied topics: market news, investment strategies, specific investments, portfolio reviews
  • Realistic client questions and advisor responses

Enrichment fields (only when using --include-enrichment flag):

  • topic_type - Category of the conversation
  • assets_mentioned - Assets discussed for trend detection
  • industries_mentioned - Industries referenced
  • market_event - Market event or news mentioned

Core fields (always included):

  • thread_id, account_id, date
  • client_message, advisor_response

Note: By default, emails are generated without enrichment fields. This keeps the data clean and allows Elasticsearch to perform its own entity extraction and analysis. Use --include-enrichment if you want pre-tagged data for trend analysis.

Upload to Elasticsearch:

python3 upload_to_elasticsearch.py

Creates index: wealth-management-emails with ELSER semantic search on messages.

Upload Process Details

Each upload script will:

  1. Connect to your Elasticsearch cluster
  2. Create an index with appropriate mappings
  3. Upload all documents in batches
  4. Verify the upload with statistics
  5. Show example queries and aggregations

Data Schema

1. Account Information Schema

Demographics:

  • account_id - Unique account identifier (e.g., "ACC00001-5074")
  • account_holder_name - Full name
  • first_name, last_name - Name components
  • email - Email address
  • phone - Phone number
  • street_address, city, state, zip_code - Address information
  • age - Account holder age

Account Details:

  • account_type - Individual Brokerage, Roth IRA, Traditional IRA, SEP IRA, etc.
  • account_opened_date - Date account was opened
  • account_status - Active, Inactive, Closed
  • risk_profile - Conservative, Moderate, Aggressive, etc.
  • investment_objective - Growth, Income, Preservation, Balanced

Portfolio & Financial:

  • portfolio_value - Total portfolio value
  • cash_balance - Cash available
  • market_value - Current market value of holdings
  • cost_basis - Original cost of holdings
  • unrealized_gain_loss - Unrealized gains/losses
  • unrealized_gain_loss_pct - Unrealized G/L percentage
  • ytd_return_pct - Year-to-date return percentage
  • annual_income - Annual income
  • net_worth - Total net worth
  • employment_status - Employed, Retired, Self-employed, etc.

Compliance & Management:

  • advisor_assigned - Assigned financial advisor
  • kyc_status - KYC compliance status
  • tax_id_verified - Tax ID verification status
  • accredited_investor - Accredited investor status
  • last_activity_date - Last account activity
  • created_at - Record creation timestamp

Elasticsearch Index: account-information with semantic search on investment_objective

2. Trade Data Schema

Trade Identification:

  • trade_id - Unique trade identifier (e.g., "TRD000001-b6a2")
  • account_id - Associated account (links to accounts)

Trade Details:

  • symbol - Stock/ETF/Bond ticker symbol
  • action - Buy or Sell
  • quantity - Number of shares/units
  • price - Execution price per share
  • trade_value - Quantity × Price
  • commission - Trading commission
  • total_amount - Trade value + commission

Execution Details:

  • trade_date - Date trade was executed
  • settlement_date - Date trade settles (T+2)
  • time_executed - Time of execution (HH:MM:SS)
  • status - Executed, Cancelled, Pending, Partially Filled
  • trade_type - Market, Limit, Stop Loss, Market on Close
  • exchange - NYSE, NASDAQ, BATS, IEX

Metadata:

  • created_at - Record creation timestamp

Elasticsearch Index: trade-data with optimized mappings for trade analysis

3. Wealth Management Emails Schema

Core Fields (always included):

  • thread_id - Unique thread identifier
  • account_id - Customer account ID (links to accounts)
  • date - Email date (YYYY-MM-DD format)
  • client_message - Client's email message
  • advisor_response - Advisor's response message

Enrichment Fields (only with --include-enrichment flag):

  • topic_type - Category: market_news, investment_strategy, specific_investment, portfolio_review
  • assets_mentioned - Assets discussed (comma-separated)
  • industries_mentioned - Industries referenced (comma-separated)
  • market_event - Market event or news mentioned

Elasticsearch Index: wealth-management-emails with ELSER semantic search on messages

Elasticsearch Mappings

All upload scripts create optimized mappings with:

  • Keyword fields for exact matches and aggregations (account_id, symbols, statuses)
  • Text fields for full-text search (messages, names, descriptions)
  • Semantic text fields for AI-powered semantic search using ELSER model
  • Numeric fields for calculations and range queries (prices, values, quantities)
  • Date fields for time-based queries and visualizations
  • Lookup mode for efficient document retrieval in serverless deployments

Example Queries

Account Information Queries

Find high-value accounts:

GET account-information/_search
{
  "query": {
    "range": {
      "portfolio_value": { "gte": 10000000 }
    }
  }
}

Aggregate by risk profile:

GET account-information/_search
{
  "size": 0,
  "aggs": {
    "by_risk": {
      "terms": { "field": "risk_profile" }
    }
  }
}

Find accounts with specific advisor:

GET account-information/_search
{
  "query": {
    "term": { "advisor_assigned": "Alice Johnson" }
  }
}

Calculate total AUM:

GET account-information/_search
{
  "size": 0,
  "aggs": {
    "total_aum": {
      "sum": { "field": "portfolio_value" }
    },
    "avg_portfolio": {
      "avg": { "field": "portfolio_value" }
    }
  }
}

Trade Data Queries

Find all trades for a specific account:

GET trade-data/_search
{
  "query": {
    "term": { "account_id": "ACC00001-5074" }
  },
  "sort": [{ "trade_date": "desc" }]
}

Most traded symbols:

GET trade-data/_search
{
  "size": 0,
  "aggs": {
    "top_symbols": {
      "terms": { "field": "symbol", "size": 20 }
    }
  }
}

Total trade volume by action:

GET trade-data/_search
{
  "size": 0,
  "aggs": {
    "by_action": {
      "terms": { "field": "action" },
      "aggs": {
        "total_value": {
          "sum": { "field": "trade_value" }
        }
      }
    }
  }
}

Find large trades (>$100k):

GET trade-data/_search
{
  "query": {
    "range": {
      "trade_value": { "gte": 100000 }
    }
  }
}

Trades in a date range:

GET trade-data/_search
{
  "query": {
    "range": {
      "trade_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31"
      }
    }
  }
}

Email Thread Queries

Search client messages:

GET wealth-management-emails/_search
{
  "query": {
    "match": {
      "client_message": "should I invest"
    }
  }
}

Semantic search (ELSER-powered):

GET wealth-management-emails/_search
{
  "query": {
    "semantic": {
      "field": "client_message_semantic",
      "query": "worried about market volatility"
    }
  }
}

Find all emails for an account:

GET wealth-management-emails/_search
{
  "query": {
    "term": { "account_id": "ACC00001-5074" }
  },
  "sort": [{ "date": "desc" }]
}

Cross-Index Queries

Find account with trades and emails:

from elasticsearch import Elasticsearch

es = Elasticsearch(["https://your-endpoint:443"], api_key="your-api-key")

account_id = "ACC00001-5074"

# Get account details
account = es.get(index="account-information", id=account_id)

# Get recent trades
trades = es.search(
    index="trade-data",
    body={
        "query": {"term": {"account_id": account_id}},
        "sort": [{"trade_date": "desc"}],
        "size": 10
    }
)

# Get email communications
emails = es.search(
    index="wealth-management-emails",
    body={
        "query": {"term": {"account_id": account_id}},
        "sort": [{"date": "desc"}],
        "size": 10
    }
)

print(f"Account: {account['_source']['account_holder_name']}")
print(f"Portfolio Value: ${account['_source']['portfolio_value']:,.2f}")
print(f"Recent Trades: {trades['hits']['total']['value']}")
print(f"Email Threads: {emails['hits']['total']['value']}")

Kibana Visualization

Access Kibana at:

Create Data Views

  1. Go to Management → Stack Management → Data Views
  2. Create data views for each index:
    • account-information
    • trade-data
    • wealth-management-emails

Explore Your Data

In Discover:

  • Explore each dataset with the corresponding data view
  • Filter by date ranges, account IDs, symbols, etc.
  • Create saved searches for common queries

Dashboard Ideas

Portfolio Overview Dashboard:

  • Total AUM (metric visualization)
  • Risk profile distribution (pie chart)
  • Top advisors by AUM (bar chart)
  • Account growth over time (line chart)
  • Geographic distribution by state (map)

Trading Activity Dashboard:

  • Daily trade volume (time series)
  • Buy vs Sell ratio (gauge)
  • Most traded symbols (bar chart)
  • Trade status distribution (pie chart)
  • Average trade size by symbol (table)
  • Trading activity heatmap by day/hour

Client Communication Dashboard:

  • Email volume over time (line chart)
  • Most active accounts (table)
  • Common topics word cloud (if using enrichment)
  • Response time analysis
  • Client sentiment tracking

Cross-Dataset Dashboard:

  • Account profile with recent trades and emails
  • Portfolio performance correlated with trading activity
  • Client engagement vs portfolio value
  • Risk profile vs trading behavior

Troubleshooting

Connection Refused

  • Make sure Elasticsearch is running: curl http://localhost:9200
  • Check the ES_HOST and ES_PORT settings
  • Verify firewall settings

Authentication Failed

  • Check ES_USERNAME and ES_PASSWORD
  • For Elastic Cloud, use the credentials from your deployment
  • Verify API key if using ES_API_KEY

Index Already Exists

  • The script will prompt to delete and recreate
  • Or manually delete: curl -X DELETE "localhost:9200/wealth-management-emails"

Dataset Statistics

Overall Platform

  • Total Documents: ~279,000+ across 3 indices
  • Total Accounts: ~7,000 customer accounts
  • Date Range: Last 365 days of activity
  • Total AUM: $26.7+ Billion
  • Total Trade Volume: $70.5+ Billion

Account Information

  • Documents: ~7,000 customer accounts
  • Average Portfolio Value: $5.0 Million
  • Risk Profiles: Moderate (33%), Moderately Conservative (30%), Conservative (20%)
  • Top Account Types: Roth IRA, Trust Account, Managed Account, Joint Brokerage
  • Employment Status: Mix of Employed, Retired, Self-employed
  • Age Range: 25-75 years
  • Geographic Coverage: All 50 US states

Trade Data

  • Documents: ~269,000 trade transactions
  • Buy Transactions: ~205,000 (76%)
  • Sell Transactions: ~64,000 (24%)
  • Average Trade Size: $262,000
  • Most Traded Symbols: QQQ, MSFT, NVDA, TSLA, AAPL, SPY
  • Execution Rate: 93.9% (252,000+ executed)
  • Trade Types: Market, Limit, Stop Loss, Market on Close
  • Exchanges: NYSE, NASDAQ, BATS, IEX
  • Date Range: Last 12 months

Wealth Management Emails

  • Documents: 5,000 email threads
  • Unique Accounts: ~3,400 accounts (some with multiple threads)
  • Average Threads per Account: 1.46
  • Topic Distribution (if enriched):
    • Market News (~25%)
    • Investment Strategy (~25%)
    • Specific Investments (~25%)
    • Portfolio Review (~25%)
  • Date Range: Last 365 days

Cross-Dataset Relationships

  • All data linked via account_id
  • ~40 trades per account on average
  • ~1.5 email threads per account on average
  • High-value accounts tend to have more trades and communication
  • Enables comprehensive customer 360° view

Use Cases

This platform demonstrates several powerful Elasticsearch capabilities:

1. Customer 360° View

  • Combine account details, trading history, and communication in one view
  • Track customer lifecycle from account opening to ongoing activity
  • Identify high-value relationships and engagement patterns

2. Semantic Search with ELSER

  • Natural language search across email communications
  • Find similar client questions and advisor responses
  • Understand customer intent without keyword matching
  • Example: "worried about market crash" finds related concerns

3. Portfolio Analytics

  • Real-time AUM calculations and aggregations
  • Portfolio performance tracking and risk analysis
  • Identify trends in asset allocation and investment strategies
  • Advisor performance metrics and client distribution

4. Trading Activity Analysis

  • Trade volume analysis by symbol, action, and time period
  • Pattern detection in trading behavior
  • Commission and transaction cost analysis
  • Identify most active traders and popular securities

5. Client Communication Intelligence

  • Track communication frequency and response times
  • Identify common client concerns and questions
  • Correlate communication with trading activity
  • Measure advisor engagement and workload

6. Compliance & Reporting

  • KYC status tracking across all accounts
  • Audit trail for trades and communications
  • Risk profile adherence monitoring
  • Regulatory reporting capabilities

7. ML & Advanced Analytics

  • Predict churn based on communication and trading patterns
  • Anomaly detection in trading behavior
  • Client segmentation for personalized service
  • Portfolio optimization recommendations

Next Steps

After uploading your data:

  1. Explore in Kibana Discover - Get familiar with each dataset
  2. Create Data Views - Set up data views for each index
  3. Build Dashboards - Create visualizations for key metrics
  4. Try Semantic Search - Test ELSER capabilities on email threads
  5. Cross-Dataset Queries - Join data across accounts, trades, and emails
  6. Set Up Alerts - Monitor for high-value trades, portfolio changes, etc.
  7. Machine Learning - Use Elasticsearch ML for anomaly detection

Additional Resources

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Ways to Contribute

  • 🐛 Report bugs
  • 💡 Suggest features
  • 📝 Improve documentation
  • 🔧 Submit pull requests
  • ⭐ Star the repository

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: All data generated by this project is synthetic and for testing and development purposes only.

About

Assets related to creating a finance demo for Elastic Agent Builder

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages