🔍 Solana Forensic Analysis & Anomaly Detection: The Upbit Incident Case Study

솔라나 온체인 포렌식 및 GNN 기반 이상 거래 탐지(FDS): 2025 업비트 해킹 사태 분석

Disclaimer: This project is for educational and research purposes only. The analysis is based on publicly available data and specific assumptions regarding the Upbit Security Notice. It does not represent absolute factual verification of specific entities or internal data.

법적 고지: 본 프로젝트는 교육 및 연구 목적으로 작성되었습니다. 모든 분석은 업비트 공지사항 등 공개된 데이터와 논리적 가정에 기반하였으며, 특정 주체에 대한 절대적 사실 확인을 의미하지 않습니다.

📂 Project Structure

├── 01_fetch_prices.py       # Historical price snapshots (Coingecko API)
├── 02_identify_victims.py   # Heuristic victim identification (IoC Reverse Tracking)
├── 03_generate_report.py    # Forensic visualization & Label generation
├── 04_collect_data.py       # High-volume on-chain data collection (Helius RPC)
├── 05_preprocess_features.py# Graph feature extraction (Temporal & Rolling stats)
├── 06_train_autoencoder.py  # Baseline Model (Tabular)
├── 07_train_gnn.py          # Proposed Model (Graph Neural Network)
├── config.py                # Hyperparameters & Time-split configuration
├── hacker_loader.py         # Upbit IoC parser
├── plots/                   # Evaluation artifacts (t-SNE, ROC, etc.)
└── forensic_reports/        # Attack timeline analysis

🇺🇸 English Description

1. Project Overview

This project implements an End-to-End Forensic Analysis & Anomaly Detection Pipeline targeting the Solana-based security incident occurred on November 27, 2025.

By reconstructing the transaction graph from the initial hacking phase, this project compares a baseline AutoEncoder with a proposed Graph Neural Network (GNN) model. The goal is to demonstrate how structural learning (Topology) significantly improves detection accuracy and reduces false positives in crypto-money laundering scenarios compared to traditional tabular methods.

2. Forensic Analysis & EDA

Before modeling, a forensic analysis was conducted to understand the attack vector.

2.1 Attack Timeline

Observation: The attack was highly concentrated within a 15-minute window (19:42 UTC - 19:55 UTC).
Insight: High-frequency automated transfers suggest the need for Temporal Features (e.g., Time-delta between txs) rather than just volume-based features.

2.2 Victim & Asset Distribution

Top Victims: The largest single loss was identified as approximately $4 million (Dma9...).
Assets: The hacker drained not only SOL (43%) but also various SPL tokens, requiring a model that can handle categorical (Token Symbol) and numerical (USD Value) data simultaneously.

3. Technical Methodology

3.1 Data Pipeline & Assumptions

Data Source: Helius RPC API (Historical Transactions).
Victim Identification Logic:
1. Extracted 165 "Abnormal Withdrawal Addresses" from the Official Notice.
2. Traced the first incoming transaction for each hacker address.
3. Assumption: Senders of these first transactions are defined as Compromised Wallets (Victims).
Scope: ±7 days around the incident (Nov 19 - Nov 26) to learn "Normal" vs "Attack" patterns.

3.2 Feature Engineering

To capture the context of transactions, I engineered dynamic graph features instead of using raw transaction data.

Feature Type	Feature Name	Description
Topology	`degree_in`, `degree_out`	Number of connections (Fan-in/Fan-out patterns).
Temporal	`time_delta`	Time elapsed since the last transaction (Detects macro/bot behavior).
Contextual	`rolling_amt_mean_5`	Moving average of transfer amounts (Last 5 txs).
Contextual	`rolling_td_std_5`	Standard deviation of time intervals (Detects regular intervals).
Categorical	`token_symbol`	One-hot encoded token types (SOL, USDC, RAY, etc.).

Noise Filtering: Applied Solana-specific domain knowledge to filter out Rent Exemption transfers (~0.002 SOL) and Dust Attacks (<$50) to prevent data poisoning.

3.3 Modeling Strategy

Baseline: AutoEncoder (AE)
- Treats each transaction as an independent instance.
- Detects point anomalies based on reconstruction error (MSE).
Proposed: GNN (Graph AutoEncoder)
- Uses GCNConv layers to aggregate information from neighbor nodes.
- Reconstructs Edge Attributes (features of the transaction itself) utilizing the graph structure.
- Hypothesis: Hackers exhibit a distinct Hub-and-Spoke topology that AE cannot capture.

4. Experimental Results

4.1 Performance Comparison

The GNN model demonstrated superior performance in identifying the hacker's structural patterns.

Metric	AutoEncoder (Baseline)	GNN (Proposed)	Improvement
ROC AUC	0.8612	0.9512	+0.09
Recall	53.88%	78.64%	+24.76%p
Precision	62.71%	83.94%	+21.23%p
False Alarms	66 cases	31 cases	-53% Reduction

4.2 Reconstruction Error Distribution (Boxplot)

This plot illustrates the separability between Normal (Blue box) and Hacker (Rightmost box) transactions.

Left (AE): The error distribution of Hacker transactions overlaps significantly with Normal transactions (especially the whiskers). This overlap is the primary cause of low Recall.
Right (GNN): The Hacker distribution is distinctly shifted upwards, with minimal overlap against the Normal distribution. This clear separation proves that the GNN successfully learned the distinct structural signature of the attack.

5. How to Run (Docker Compose)

Prerequisites

Docker & Docker Compose installed.
Helius API Key:
- Required for fetching on-chain data.
- Get a free API key here. The free tier (1M credits/month) is sufficient.
- ⚠️ Cost Warning: A single full execution of this pipeline consumes approximately 85,000 Credits.

Execution Steps

Clone & Setup:

git clone https://github.com/brainVRG/upbit-solana-hack-analysis
cd upbit-solana-hack-analysis

Environment Variable: Create a .env file in the root directory.
```
HELIUS_API_KEY=your_api_key_here
```

Run Pipeline:

# 1. Build and start the container in detached mode
docker-compose up -d --build

# 2. Execute the main pipeline script inside the container
docker-compose exec analysis-lab python main_pipeline.py

🇰🇷 Korean Description

1. 프로젝트 개요

본 프로젝트는 2025년 11월 27일 발생한 업비트(Upbit) 솔라나 계열 보안 사고를 케이스 스터디로 하여, 블록체인 포렌식 및 이상 거래 탐지(FDS) 파이프라인을 구축한 연구입니다.

해커의 공격이 시작된 시점의 트랜잭션 그래프를 복원하고, 기존의 정형 데이터 기반 모델(AutoEncoder)과 그래프 신경망(GNN) 모델을 비교 분석하였습니다. 이를 통해 자금 세탁 및 대규모 탈취 시나리오에서 거래의 구조(Topology)와 맥락(Context)을 학습하는 것이 탐지 성능에 얼마나 결정적인 영향을 미치는지 증명합니다.

2. 포렌식 분석 (Forensic Analysis)

2.1 공격 타임라인 분석

분석: 공격은 19:42 UTC부터 19:55 UTC까지 약 15분이라는 짧은 시간 동안 집중적으로 발생했습니다.
인사이트: 사람이 수행하기 어려운 고빈도 전송 패턴이 확인되었으며, 이는 모델링 시 단순 금액뿐만 아니라 '시간 간격(Time-delta)' 피처가 중요함을 시사합니다.

2.2 피해 규모 및 자산 분포

단일 지갑 기준 최대 피해액은 약 400만 달러($4M)에 달하는 것으로 확인되었습니다. (Dma9...)
SOL 뿐만 아니라 LAYER, TRUMP 등 다양한 SPL 토큰이 탈취되었기에, 다종의 토큰을 처리할 수 있는 임베딩(One-hot Encoding) 처리가 필수적이었습니다.

3. 기술적 접근 및 방법론

3.1 데이터 파이프라인 및 가정

데이터 수집: Helius API를 활용하여 사고 전후 7일간의 트랜잭션 및 당시 토큰 시세(Historical Price)를 수집했습니다.
피해자 식별 로직 (Victim Inference):
1. 업비트 공지사항에 공개된 165개의 '비정상 출금 주소'를 해커 주소로 설정.
2. 해당 주소로 최초의 자금을 전송한 지갑을 역추적하여 피해자(Compromised Wallet)로 정의.
3. 가정(Assumption): 해킹 직전 생성된 해커 지갑으로 자금을 보낸 주체는 탈취당한 지갑(Victim)일 것이다.

3.2 피처 엔지니어링 (Feature Engineering)

단순한 송금 기록을 넘어, '이상 징후'를 포착하기 위해 파생 변수를 생성했습니다.

구분	피처명	설명 및 의도
Topology	`degree_in/out`	특정 지갑으로 자금이 쏠리는 Hub 구조 탐지
Temporal	`time_delta`	직전 거래와의 시간 차 (매크로/봇 탐지)
Statistical	`rolling_td_std_5`	시간 간격의 표준편차 (기계적 주기성 탐지)
Contextual	`amount_ratio`	평소 이동평균 대비 현재 송금액의 비율 (급격한 자금 이탈 탐지)

Business Logic Filter: 블록체인 특성상 발생하는 Rent 비용(~0.002 SOL) 및 $50 미만의 Dust 트랜잭션은 노이즈로 간주하여 학습 데이터에서 제외하였습니다.

3.3 모델링 전략

Baseline: AutoEncoder (AE)
- 개별 트랜잭션의 속성값(Amount, Time 등)만을 보고 복원 오차(Reconstruction Error)를 통해 이상치를 탐지.
Proposed: GNN (Graph AutoEncoder)
- GCN(Graph Convolutional Network) 레이어를 사용하여 송수신자 간의 관계와 이웃 노드의 상태 정보를 집계(Aggregate).
- 가설: "해커는 단시간에 새로운 지갑들로부터 자금을 수집하는 특유의 Hub-and-Spoke 네트워크 구조를 가질 것이다."

4. 실험 결과 (Experimental Results)

4.1 성능 비교표

GNN 모델은 거래의 구조적 맥락을 학습함으로써, 모든 지표에서 압도적인 성능 향상을 보였습니다.

지표 (Metric)	AutoEncoder (Baseline)	GNN (Proposed)	증감 (Improvement)
ROC AUC	0.8612	0.9512	+0.09
재현율 (Recall)	53.88%	78.64%	+24.76%p
정밀도 (Precision)	62.71%	83.94%	+21.23%p
오탐 (False Alarms)	66건	31건	53% 감소

Insight: 기존 AE 모델은 고액 전송을 하는 정상적인 고래(Whale) 유저를 해커로 오인하는 경우가 많았으나(False Positive), GNN은 거래 패턴과 관계를 분석하여 이를 53%나 감소시켰습니다. 이는 FDS 운영 효율화 관점에서 매우 중요한 성과입니다.

4.2 오차 분포 비교 (Boxplot Analysis)

모델이 정상(Normal)과 해커(Hacker)를 얼마나 명확하게 구분하는지 보여주는 분포도입니다.

좌측 (AE): 해커 데이터(우측 박스)의 분포 범위가 넓고, 정상 데이터(중간 박스)와 상당 부분 중첩(Overlap) 됩니다. 이는 모델이 해커 패턴을 완벽히 격리하지 못했음을 의미합니다.
우측 (GNN): 해커 데이터의 박스가 위쪽으로 확실하게 이동(Shift) 되어 있으며, 정상 데이터와의 겹침이 거의 없습니다. GNN이 공격자가 남긴 구조적 흔적을 학습하여 명확한 탐지 경계를 형성했음을 증명합니다.

5. 실행 방법 (Docker Compose)

사전 요구 사항

Docker & Docker Compose 설치 완료.
Helius API Key:
- 온체인 데이터 수집을 위해 필요합니다.
- 여기서 무료 API 키를 발급받으세요. 무료 티어(월 100만 크레딧)로도 충분합니다.
- ⚠️ 비용 주의: 이 파이프라인을 1회 완전히 실행할 경우 약 85,000 크레딧이 소모됩니다.

실행 단계

레포지토리 클론:

git clone https://github.com/brainVRG/upbit-solana-hack-analysis
cd upbit-solana-hack-analysis

환경 변수 설정: 루트 경로에 .env 파일을 생성하고 키를 입력합니다.
```
HELIUS_API_KEY=your_api_key_here
```

파이프라인 실행:

# 1. 이미지를 빌드하고 컨테이너를 백그라운드에서 실행
docker-compose up -d --build

# 2. 실행 중인 컨테이너 내부에서 전체 분석 파이프라인 스크립트 실행
docker-compose exec analysis-lab python main_pipeline.py

Acknowledgments & Powered By

This project leverages the following powerful APIs and services:

이 프로젝트는 다음 서비스들의 API를 활용하여 제작되었습니다:

On-Chain Data & RPC: Powered by Helius.
Market Data: Data provided by CoinGecko API.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
forensic_reports		forensic_reports
plots		plots
.gitignore		.gitignore
01_fetch_prices.py		01_fetch_prices.py
02_identify_victims.py		02_identify_victims.py
03_generate_report.py		03_generate_report.py
04_collect_data.py		04_collect_data.py
05_preprocess_features.py		05_preprocess_features.py
06_train_autoencoder.py		06_train_autoencoder.py
07_train_gnn.py		07_train_gnn.py
Dockerfile		Dockerfile
LICENSE		LICENSE
config.py		config.py
docker-compose.yml		docker-compose.yml
hacker_loader.py		hacker_loader.py
main_pipeline.py		main_pipeline.py
readme.md		readme.md
requirements.txt		requirements.txt
utils_metrics.py		utils_metrics.py
utils_plot.py		utils_plot.py

License

brainVRG/upbit-solana-hack-analysis

Folders and files

Latest commit

History

Repository files navigation