This project generates comprehensive synthetic wealth management data and uploads it to Elasticsearch for analysis, searching, and visualization. It includes three interconnected datasets that simulate a complete wealth management platform.
⚠️ Important: All data generated by this project is 100% synthetic and for demonstration purposes only. No real personal or financial information is included.
The platform includes three core data sources:
- Account Information - Customer profiles with portfolio details
- Trade Data - Historical trading activity and transactions
- Wealth Management Emails - Client-advisor communication threads
All data is linked via account_id for comprehensive cross-dataset analysis.
generate_account_info.py- Generates customer account profilesgenerate_trade_data.py- Generates trade transactionsgenerate_10k_emails.py- Generates email communication threads
upload_accounts.py- Uploads account data to Elasticsearchupload_trades.py- Uploads trade data to Elasticsearchupload_to_elasticsearch.py- Uploads email threads to Elasticsearch
account_information.csv- Generated customer accounts (~7,000 accounts)trade_data.csv- Generated trade transactions (~269,000 trades)wealth_management_emails.csv- Generated email threads (5,000 threads)
requirements.txt- Python dependenciesQUICKSTART.md- Quick start guideELASTIC_CLOUD_SETUP.md- Detailed Elastic Cloud setupDATA_OVERVIEW.md- Comprehensive data schema reference
git clone https://github.com/YOUR-USERNAME/Finance_Demo.git
cd Finance_Demopip install -r requirements.txtImportant: Never commit real credentials to the repository!
# Copy the example configuration
cp config_example.py config.py
# Edit config.py with your actual Elasticsearch credentials
# config.py is in .gitignore and won't be committedEdit config.py and update:
ES_ENDPOINT_URL- Your Elasticsearch endpointES_API_KEY- Your API key (recommended) or username/password
Alternative: Use environment variables:
export ES_ENDPOINT_URL="https://your-deployment:443"
export ES_API_KEY="your-api-key"Quick Start:
- Sign up at https://cloud.elastic.co (14-day free trial)
- Create a deployment
- Copy your Cloud ID and create an API key
- Follow the detailed guide: ELASTIC_CLOUD_SETUP.md
Benefits:
- Fully managed, no infrastructure to maintain
- Automatic backups and updates
- Built-in security and monitoring
- Kibana included
- Free trial available
Option A: Using Docker
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0Option B: Using Homebrew (macOS)
brew install elasticsearch
brew services start elasticsearchOption C: Download and Run
- Download from https://www.elastic.co/downloads/elasticsearch
- Extract and run:
./bin/elasticsearch
Note: With
config.pyset up (Step 3), the upload scripts will automatically use your credentials. No manual script editing needed!
To generate and upload all three datasets at once:
# 1. Generate all data (run in sequence, accounts must be generated first)
python3 generate_account_info.py # ~7,000 customer accounts
python3 generate_trade_data.py # ~269,000 trades
python3 generate_10k_emails.py # 5,000 email threads
# 2. Upload all data to Elasticsearch
python3 upload_accounts.py # Upload accounts
python3 upload_trades.py # Upload trades
python3 upload_to_elasticsearch.py # Upload emailsGenerate Account Data:
python3 generate_account_info.pyThis creates account_information.csv with:
- ~7,000 customer account profiles
- Complete demographics (name, age, address, contact info)
- Account details (type, status, opened date, risk profile)
- Portfolio metrics (value, cash balance, unrealized gains, YTD returns)
- Financial information (income, net worth, employment status)
- Compliance fields (KYC status, tax ID verification, accredited investor)
Upload to Elasticsearch:
python3 upload_accounts.pyCreates index: account-information with semantic search on investment objectives.
Generate Trade Data:
python3 generate_trade_data.pyThis creates trade_data.csv with:
- ~269,000 trade transactions across all accounts
- Buy/sell transactions for stocks, ETFs, bonds
- Realistic pricing, quantities, and commissions
- Trade execution details (date, time, exchange, status)
- Settlement dates and total amounts
Upload to Elasticsearch:
python3 upload_trades.pyCreates index: trade-data with optimized mappings for trade analysis.
Generate Email Data:
Basic generation (core fields only - no enrichment):
python3 generate_10k_emails.pyWith enrichment fields (for trend analysis):
python3 generate_10k_emails.py --include-enrichmentThis creates wealth_management_emails.csv with:
- 5,000 email thread examples
- Account IDs linked to generated accounts
- Varied topics: market news, investment strategies, specific investments, portfolio reviews
- Realistic client questions and advisor responses
Enrichment fields (only when using --include-enrichment flag):
topic_type- Category of the conversationassets_mentioned- Assets discussed for trend detectionindustries_mentioned- Industries referencedmarket_event- Market event or news mentioned
Core fields (always included):
thread_id,account_id,dateclient_message,advisor_response
Note: By default, emails are generated without enrichment fields. This keeps the data clean and allows Elasticsearch to perform its own entity extraction and analysis. Use
--include-enrichmentif you want pre-tagged data for trend analysis.
Upload to Elasticsearch:
python3 upload_to_elasticsearch.pyCreates index: wealth-management-emails with ELSER semantic search on messages.
Each upload script will:
- Connect to your Elasticsearch cluster
- Create an index with appropriate mappings
- Upload all documents in batches
- Verify the upload with statistics
- Show example queries and aggregations
Demographics:
account_id- Unique account identifier (e.g., "ACC00001-5074")account_holder_name- Full namefirst_name,last_name- Name componentsemail- Email addressphone- Phone numberstreet_address,city,state,zip_code- Address informationage- Account holder age
Account Details:
account_type- Individual Brokerage, Roth IRA, Traditional IRA, SEP IRA, etc.account_opened_date- Date account was openedaccount_status- Active, Inactive, Closedrisk_profile- Conservative, Moderate, Aggressive, etc.investment_objective- Growth, Income, Preservation, Balanced
Portfolio & Financial:
portfolio_value- Total portfolio valuecash_balance- Cash availablemarket_value- Current market value of holdingscost_basis- Original cost of holdingsunrealized_gain_loss- Unrealized gains/lossesunrealized_gain_loss_pct- Unrealized G/L percentageytd_return_pct- Year-to-date return percentageannual_income- Annual incomenet_worth- Total net worthemployment_status- Employed, Retired, Self-employed, etc.
Compliance & Management:
advisor_assigned- Assigned financial advisorkyc_status- KYC compliance statustax_id_verified- Tax ID verification statusaccredited_investor- Accredited investor statuslast_activity_date- Last account activitycreated_at- Record creation timestamp
Elasticsearch Index: account-information with semantic search on investment_objective
Trade Identification:
trade_id- Unique trade identifier (e.g., "TRD000001-b6a2")account_id- Associated account (links to accounts)
Trade Details:
symbol- Stock/ETF/Bond ticker symbolaction- Buy or Sellquantity- Number of shares/unitsprice- Execution price per sharetrade_value- Quantity × Pricecommission- Trading commissiontotal_amount- Trade value + commission
Execution Details:
trade_date- Date trade was executedsettlement_date- Date trade settles (T+2)time_executed- Time of execution (HH:MM:SS)status- Executed, Cancelled, Pending, Partially Filledtrade_type- Market, Limit, Stop Loss, Market on Closeexchange- NYSE, NASDAQ, BATS, IEX
Metadata:
created_at- Record creation timestamp
Elasticsearch Index: trade-data with optimized mappings for trade analysis
Core Fields (always included):
thread_id- Unique thread identifieraccount_id- Customer account ID (links to accounts)date- Email date (YYYY-MM-DD format)client_message- Client's email messageadvisor_response- Advisor's response message
Enrichment Fields (only with --include-enrichment flag):
topic_type- Category: market_news, investment_strategy, specific_investment, portfolio_reviewassets_mentioned- Assets discussed (comma-separated)industries_mentioned- Industries referenced (comma-separated)market_event- Market event or news mentioned
Elasticsearch Index: wealth-management-emails with ELSER semantic search on messages
All upload scripts create optimized mappings with:
- Keyword fields for exact matches and aggregations (account_id, symbols, statuses)
- Text fields for full-text search (messages, names, descriptions)
- Semantic text fields for AI-powered semantic search using ELSER model
- Numeric fields for calculations and range queries (prices, values, quantities)
- Date fields for time-based queries and visualizations
- Lookup mode for efficient document retrieval in serverless deployments
Find high-value accounts:
GET account-information/_search
{
"query": {
"range": {
"portfolio_value": { "gte": 10000000 }
}
}
}Aggregate by risk profile:
GET account-information/_search
{
"size": 0,
"aggs": {
"by_risk": {
"terms": { "field": "risk_profile" }
}
}
}Find accounts with specific advisor:
GET account-information/_search
{
"query": {
"term": { "advisor_assigned": "Alice Johnson" }
}
}Calculate total AUM:
GET account-information/_search
{
"size": 0,
"aggs": {
"total_aum": {
"sum": { "field": "portfolio_value" }
},
"avg_portfolio": {
"avg": { "field": "portfolio_value" }
}
}
}Find all trades for a specific account:
GET trade-data/_search
{
"query": {
"term": { "account_id": "ACC00001-5074" }
},
"sort": [{ "trade_date": "desc" }]
}Most traded symbols:
GET trade-data/_search
{
"size": 0,
"aggs": {
"top_symbols": {
"terms": { "field": "symbol", "size": 20 }
}
}
}Total trade volume by action:
GET trade-data/_search
{
"size": 0,
"aggs": {
"by_action": {
"terms": { "field": "action" },
"aggs": {
"total_value": {
"sum": { "field": "trade_value" }
}
}
}
}
}Find large trades (>$100k):
GET trade-data/_search
{
"query": {
"range": {
"trade_value": { "gte": 100000 }
}
}
}Trades in a date range:
GET trade-data/_search
{
"query": {
"range": {
"trade_date": {
"gte": "2024-01-01",
"lte": "2024-12-31"
}
}
}
}Search client messages:
GET wealth-management-emails/_search
{
"query": {
"match": {
"client_message": "should I invest"
}
}
}Semantic search (ELSER-powered):
GET wealth-management-emails/_search
{
"query": {
"semantic": {
"field": "client_message_semantic",
"query": "worried about market volatility"
}
}
}Find all emails for an account:
GET wealth-management-emails/_search
{
"query": {
"term": { "account_id": "ACC00001-5074" }
},
"sort": [{ "date": "desc" }]
}Find account with trades and emails:
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://your-endpoint:443"], api_key="your-api-key")
account_id = "ACC00001-5074"
# Get account details
account = es.get(index="account-information", id=account_id)
# Get recent trades
trades = es.search(
index="trade-data",
body={
"query": {"term": {"account_id": account_id}},
"sort": [{"trade_date": "desc"}],
"size": 10
}
)
# Get email communications
emails = es.search(
index="wealth-management-emails",
body={
"query": {"term": {"account_id": account_id}},
"sort": [{"date": "desc"}],
"size": 10
}
)
print(f"Account: {account['_source']['account_holder_name']}")
print(f"Portfolio Value: ${account['_source']['portfolio_value']:,.2f}")
print(f"Recent Trades: {trades['hits']['total']['value']}")
print(f"Email Threads: {emails['hits']['total']['value']}")Access Kibana at:
- Elastic Cloud: https://cloud.elastic.co/deployments (click "Open Kibana")
- Local: http://localhost:5601
- Go to Management → Stack Management → Data Views
- Create data views for each index:
account-informationtrade-datawealth-management-emails
In Discover:
- Explore each dataset with the corresponding data view
- Filter by date ranges, account IDs, symbols, etc.
- Create saved searches for common queries
Portfolio Overview Dashboard:
- Total AUM (metric visualization)
- Risk profile distribution (pie chart)
- Top advisors by AUM (bar chart)
- Account growth over time (line chart)
- Geographic distribution by state (map)
Trading Activity Dashboard:
- Daily trade volume (time series)
- Buy vs Sell ratio (gauge)
- Most traded symbols (bar chart)
- Trade status distribution (pie chart)
- Average trade size by symbol (table)
- Trading activity heatmap by day/hour
Client Communication Dashboard:
- Email volume over time (line chart)
- Most active accounts (table)
- Common topics word cloud (if using enrichment)
- Response time analysis
- Client sentiment tracking
Cross-Dataset Dashboard:
- Account profile with recent trades and emails
- Portfolio performance correlated with trading activity
- Client engagement vs portfolio value
- Risk profile vs trading behavior
- Make sure Elasticsearch is running:
curl http://localhost:9200 - Check the ES_HOST and ES_PORT settings
- Verify firewall settings
- Check ES_USERNAME and ES_PASSWORD
- For Elastic Cloud, use the credentials from your deployment
- Verify API key if using ES_API_KEY
- The script will prompt to delete and recreate
- Or manually delete:
curl -X DELETE "localhost:9200/wealth-management-emails"
- Total Documents: ~279,000+ across 3 indices
- Total Accounts: ~7,000 customer accounts
- Date Range: Last 365 days of activity
- Total AUM: $26.7+ Billion
- Total Trade Volume: $70.5+ Billion
- Documents: ~7,000 customer accounts
- Average Portfolio Value: $5.0 Million
- Risk Profiles: Moderate (33%), Moderately Conservative (30%), Conservative (20%)
- Top Account Types: Roth IRA, Trust Account, Managed Account, Joint Brokerage
- Employment Status: Mix of Employed, Retired, Self-employed
- Age Range: 25-75 years
- Geographic Coverage: All 50 US states
- Documents: ~269,000 trade transactions
- Buy Transactions: ~205,000 (76%)
- Sell Transactions: ~64,000 (24%)
- Average Trade Size: $262,000
- Most Traded Symbols: QQQ, MSFT, NVDA, TSLA, AAPL, SPY
- Execution Rate: 93.9% (252,000+ executed)
- Trade Types: Market, Limit, Stop Loss, Market on Close
- Exchanges: NYSE, NASDAQ, BATS, IEX
- Date Range: Last 12 months
- Documents: 5,000 email threads
- Unique Accounts: ~3,400 accounts (some with multiple threads)
- Average Threads per Account: 1.46
- Topic Distribution (if enriched):
- Market News (~25%)
- Investment Strategy (~25%)
- Specific Investments (~25%)
- Portfolio Review (~25%)
- Date Range: Last 365 days
- All data linked via
account_id - ~40 trades per account on average
- ~1.5 email threads per account on average
- High-value accounts tend to have more trades and communication
- Enables comprehensive customer 360° view
This platform demonstrates several powerful Elasticsearch capabilities:
- Combine account details, trading history, and communication in one view
- Track customer lifecycle from account opening to ongoing activity
- Identify high-value relationships and engagement patterns
- Natural language search across email communications
- Find similar client questions and advisor responses
- Understand customer intent without keyword matching
- Example: "worried about market crash" finds related concerns
- Real-time AUM calculations and aggregations
- Portfolio performance tracking and risk analysis
- Identify trends in asset allocation and investment strategies
- Advisor performance metrics and client distribution
- Trade volume analysis by symbol, action, and time period
- Pattern detection in trading behavior
- Commission and transaction cost analysis
- Identify most active traders and popular securities
- Track communication frequency and response times
- Identify common client concerns and questions
- Correlate communication with trading activity
- Measure advisor engagement and workload
- KYC status tracking across all accounts
- Audit trail for trades and communications
- Risk profile adherence monitoring
- Regulatory reporting capabilities
- Predict churn based on communication and trading patterns
- Anomaly detection in trading behavior
- Client segmentation for personalized service
- Portfolio optimization recommendations
After uploading your data:
- Explore in Kibana Discover - Get familiar with each dataset
- Create Data Views - Set up data views for each index
- Build Dashboards - Create visualizations for key metrics
- Try Semantic Search - Test ELSER capabilities on email threads
- Cross-Dataset Queries - Join data across accounts, trades, and emails
- Set Up Alerts - Monitor for high-value trades, portfolio changes, etc.
- Machine Learning - Use Elasticsearch ML for anomaly detection
- QUICKSTART.md - Quick start guide for Elastic Cloud
- ELASTIC_CLOUD_SETUP.md - Detailed Cloud setup instructions
- DATA_OVERVIEW.md - Comprehensive data schema reference
- CONTRIBUTING.md - Contribution guidelines
- CODE_OF_CONDUCT.md - Community code of conduct
- SECURITY.md - Security policy and best practices
- Elasticsearch Documentation: https://www.elastic.co/guide/
- ELSER Documentation: https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html
We welcome contributions! Please see our Contributing Guidelines for details.
- 🐛 Report bugs
- 💡 Suggest features
- 📝 Improve documentation
- 🔧 Submit pull requests
- ⭐ Star the repository
This project is licensed under the MIT License - see the LICENSE file for details.
Note: All data generated by this project is synthetic and for testing and development purposes only.