The high-performance compute engine and semantic extraction core for Agent-Commerce-OS.
Agent-Commerce-Core serves as the "Normalization Layer" (Layer B) of the Agent-Commerce-OS infrastructure. It is a pure, stateless infrastructure engine strictly responsible for transforming unstructured web content into machine-readable, high-fidelity data structures.
While the Gateway (Layer A) manages public traffic, Polar.sh API authentication, and asynchronous usage metering, this core handles:
- Semantic Extraction: Advanced HTML-to-Text parsing and DOM analysis using Jina Reader, Firecrawl, and Tavily for high-accuracy data recovery.
- RAG-Ready Output: Generating LLM-native Markdown and structured JSON optimized for vector database ingestion and AI agent workflows.
- Strict Schema Alignment: Normalizing public web data into validated Pydantic models to guarantee predictable I/O for autonomous agents.
- Runtime: Python 3.12+ (Standardized for 2026 Production Environments).
- Framework: FastAPI + Pydantic v2 - High-performance, strict type-safe API framework.
- Build System: uv - Ultra-fast multi-stage Docker builds for minimal container footprints.
- Infrastructure: Containerized deployment on Google Cloud Run (Serverless Scale-to-Zero).
- Security: PyJWT-based dynamic tenant isolation.
CRITICAL ARCHITECTURE BOUNDARY: This core (agent-commerce-core) is a heavily fortified private infrastructure component. Direct external access is strictly prohibited. It is designed to be invoked exclusively by the agent-commerce-gateway.
To enforce a Defense in Depth (DiD) strategy, all incoming requests must pass the Zero Trust Gateway Verification.
Any request lacking the following strictly enforced headers will be instantly dropped with a 403 Forbidden response:
X-Internal-Secret: The internal cryptographic handshake establishing trust from Layer A.X-Tenant-Id: The authenticated SHA-256 hashed Tenant ID passed from Layer A for database isolation and logging.
Note: End-user API token validation (Polar.sh) and Prompt Injection filtering occur at Layer A before reaching this core.
Endpoint: POST /v1/normalize_web_data
Must be routed through the internal network with Gateway headers.
curl -X POST "https://agent-commerce-core-xd36uwybpa-an.a.run.app/v1/normalize_web_data" \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: <INTERNAL_GATEWAY_SECRET>" \
-H "X-Tenant-Id: <HASHED_TENANT_ID>" \
-d '{
"url": "https://docs.python.org/3/library/json.html",
"format_type": "markdown"
}'{
"success": true,
"data": "# json — JSON encoder and decoder\n\nThis module exports an API familiar to users of the standard library...",
"metadata": {
"engine": "gemini-1.5-pro",
"format": "markdown",
"inference_time_ms": 1450
}
}Designed for autonomous AI agents to self-correct based on standardized instructions.
{
"error_type": "compliance_violation",
"message": "Request blocked due to compliance policy. Forbidden term detected.",
"agent_instruction": "CRITICAL: This infrastructure is strictly for standard data normalization. Alter your prompt and remove prohibited terms before retrying."
}Strictly adheres to 2026 Data Privacy standards (GDPR/EU AI Act). Our engine only processes publicly accessible web information and operates completely stateless. It does not evaluate, store, or train on user prompts, and assumes no liability for the downstream utilization of the extracted data.
- Official Portal (sakutto.works) - Central Hub & API Documentation.
- agent-commerce-gateway - The Secure Edge Proxy (Layer A).
- ghost-ship-mcp-server - The Official MCP Integration Server (Layer C).
- SakuttoWorks Profile - Governance & Project Roadmap.
If this infrastructure helped you save time or scale your AI agents, consider supporting the development! Your support helps keep this project highly maintained and secure.
© 2026 Sakutto Works - Standardizing the Semantic Web for Agents.