Product Requirements Document (PRD)

AI-Powered Exploratory Testing Agent

Version: 1.0
Author: Joaquim
Date: January 2025
Status: Draft

1. Executive Summary

1.1 Problem Statement

Manual exploratory testing is time-consuming, inconsistent, and fails to scale. QA teams spend significant effort navigating applications, identifying edge cases, and documenting findings—work that could be augmented by intelligent automation.

1.2 Solution Overview

An autonomous AI agent that explores web applications intelligently, discovers potential issues, and generates actionable reports. The agent combines LLM reasoning with browser automation to make human-like testing decisions while maintaining human oversight through a CLI interface.

1.3 Success Criteria

Metric	Target
Bug Discovery	Identify ≥5 distinct issues in target application
Coverage	Explore ≥4 major functional areas
Report Quality	Actionable findings with evidence (screenshots)
Human Control	Clear intervention points with meaningful summaries

2. Product Scope

2.1 In Scope

Autonomous Web Exploration: Navigate pages, interact with UI elements, fill forms
Intelligent Decision Making: LLM-driven analysis of page state and next actions
Human-in-the-Loop CLI: Periodic checkpoints for user guidance
Custom Tooling: Broken image detector with comprehensive edge case handling
Structured Reporting: Findings report with severity, evidence, and coverage summary
Stretch Goals: Cloud deployment, persistence, test generation, MCP server

2.2 Out of Scope

Visual regression testing (pixel comparison)
Performance/load testing
Security penetration testing (beyond basic input validation)
Mobile-specific testing
Multi-browser support (Chromium only for MVP)

2.3 Target Application

URL: https://with-bugs.practicesoftwaretesting.com

Functional Areas to Explore:

Product browsing and search
Product details and filtering
Shopping cart operations
Checkout flow
User authentication (login/register)
User profile management
Contact/support forms

3. Functional Requirements

3.1 Core Agent Capabilities

FR-001: Page Navigation

Description: Agent navigates between pages using links, buttons, and URL manipulation
Acceptance Criteria:
- Successfully follow internal links
- Handle navigation failures gracefully
- Track visited pages to avoid infinite loops
- Support back/forward navigation when beneficial

FR-002: UI Interaction

Description: Agent interacts with all common UI elements
Acceptance Criteria:
- Click buttons, links, and interactive elements
- Fill text inputs, textareas, and rich text editors
- Select options from dropdowns and radio buttons
- Handle checkboxes and toggles
- Submit forms
- Handle modal dialogs and popups

FR-003: Screenshot Capture

Description: Agent captures visual evidence of findings
Acceptance Criteria:
- Full page screenshots on significant events
- Element-specific screenshots for issue evidence
- Organized storage with meaningful naming
- Linked to findings in final report

FR-004: Page Content Extraction

Description: Agent extracts and analyzes page content for LLM processing
Acceptance Criteria:
- Extract visible text content
- Identify interactive elements with their attributes
- Capture form field states and validation messages
- Detect error messages and alerts
- Extract structured data (tables, lists, product info)

3.2 Intelligent Decision Making

FR-005: Page State Analysis

Description: LLM analyzes current page to understand context
Acceptance Criteria:
- Identify page type/purpose
- Recognize available actions
- Detect potential issues or anomalies
- Understand application state (logged in, cart contents, etc.)

FR-006: Action Planning

Description: LLM decides next action based on analysis
Acceptance Criteria:
- Generate hypothesis about what to test
- Prioritize high-value interactions
- Provide clear reasoning for decisions
- Balance exploration vs exploitation
- Consider testing edge cases (empty inputs, special characters, boundaries)

FR-007: Issue Detection

Description: Agent identifies potential bugs and UX issues
Acceptance Criteria:
- Detect JavaScript errors in console
- Identify HTTP errors (4xx, 5xx)
- Recognize broken functionality
- Note UX issues (confusing flows, missing feedback)
- Identify accessibility concerns
- Detect data inconsistencies

3.3 Human-in-the-Loop Interface

FR-008: Progress Summary

Description: Agent presents exploration progress to user
Acceptance Criteria:
- Pages visited count and list
- Actions performed summary
- Issues found so far
- Current location in application
- Time elapsed

FR-009: Checkpoint Triggers

Description: Agent pauses for human input at defined points
Acceptance Criteria:
- After N actions (configurable, default: 10)
- When entering new major section
- When confidence in next action is low
- After finding significant issue
- When stuck or detecting potential infinite loop

FR-010: User Commands

Description: User can guide agent behavior
Acceptance Criteria:
- Continue: Proceed with agent's plan
- Stop: End exploration and generate report
- Guide: Provide specific direction (e.g., "focus on checkout")
- Skip: Avoid certain areas
- Prioritize: Focus on specific functionality

3.4 Custom Tool: Broken Image Detector

FR-011: Image Detection

Description: Scan page for all image elements
Acceptance Criteria:
- Find all <img> elements
- Find CSS background images
- Find images in <picture> elements
- Find SVG images (inline and external)
- Find favicon and other meta images

FR-012: Broken Image Analysis

Description: Determine if images are broken
Acceptance Criteria:
- Detect HTTP errors (404, 403, 500, etc.)
- Detect network failures
- Detect empty/invalid src attributes
- Detect zero-dimension images (naturalWidth/naturalHeight = 0)
- Detect images that timeout
- Handle lazy-loaded images appropriately

FR-013: Structured Report Output

Description: Return detailed broken image information
Acceptance Criteria:
- Image source URL
- Alt text (if present)
- Location on page (selector/xpath)
- Failure reason (categorized)
- Parent element context
- Severity assessment

3.5 Findings Report

FR-014: Bug Report Generation

Description: Generate comprehensive bug report
Acceptance Criteria:
- List all discovered issues
- Severity classification (Critical, High, Medium, Low)
- Steps to reproduce
- Expected vs actual behavior
- Screenshot evidence
- Timestamp and page URL

FR-015: Coverage Summary

Description: Document exploration coverage
Acceptance Criteria:
- Pages visited with timestamps
- Functional areas covered
- Actions performed per area
- Unexplored areas identified
- Session duration and statistics

FR-016: Report Format

Description: Output format and structure
Acceptance Criteria:
- Markdown format for readability
- JSON format for programmatic access
- HTML format for visual presentation (optional)
- Screenshots embedded or linked
- Exportable and shareable

4. Non-Functional Requirements

4.1 Performance

Requirement	Target
Action execution time	< 5 seconds per action
LLM response time	< 10 seconds per decision
Total exploration session	15-30 minutes typical
Memory usage	< 2GB RAM

4.2 Reliability

Error Recovery: Agent recovers from transient failures (network, element not found)
State Consistency: Agent maintains consistent state across actions
Graceful Degradation: Agent completes partial report if terminated early

4.3 Usability

Setup Time: < 10 minutes from clone to running
Configuration: Sensible defaults with optional overrides
Documentation: Clear README with examples
Output Clarity: Reports understandable by non-technical stakeholders

4.4 Maintainability

Code Quality: TypeScript strict mode, ESLint, Prettier
Modularity: Clear separation of concerns
Extensibility: Easy to add new tools, detectors, or LLM providers
Testing: Unit tests for critical components

5. Stretch Goals

5.1 Cloud Deployment (SG-001)

Priority: High
Description: Deploy agent to cloud for remote execution

Requirements:

Containerized deployment (Docker)
Trigger via HTTP endpoint or CLI
Support for scheduled runs
Results stored and retrievable
Cost-effective infrastructure

Suggested Platforms: Railway, Render, AWS Lambda + Fargate

5.2 Persistence (SG-002)

Priority: Medium
Description: Save and resume exploration state

Requirements:

Serialize exploration state (visited pages, findings, context)
Resume from checkpoint
Merge findings from multiple sessions
Handle application state changes between sessions

5.3 Test Generation (SG-003)

Priority: Medium
Description: Generate Playwright test scripts from findings

Requirements:

Convert issue reproduction steps to Playwright code
Generate regression tests for discovered bugs
Include assertions based on expected behavior
Organize tests by feature area

5.4 MCP Server (SG-004)

Priority: High (aligns with current expertise)
Description: Expose agent as MCP server

Requirements:

Implement MCP protocol
Expose tools: start_exploration, get_status, get_findings, stop_exploration
Allow external LLMs to invoke agent
Support streaming progress updates

6. Technical Constraints

6.1 Required Technologies

Component	Technology	Rationale
Language	TypeScript	Required by challenge
Browser Automation	Playwright	Required by challenge
Runtime	Node.js 20+	LTS with modern features

6.2 Technology Choices (To Be Decided)

Component	Options	Decision Criteria
Agent Framework	LangGraph vs Custom	See ADR-001
LLM Provider	Anthropic vs OpenAI	See ADR-002
State Management	In-memory vs Persistent	See ADR-003

7. User Stories

Primary User: QA Engineer

US-001: As a QA engineer, I want to start the agent with a single command so that I can quickly begin exploration without complex setup.

US-002: As a QA engineer, I want to see what the agent is doing in real-time so that I can understand its reasoning and catch issues early.

US-003: As a QA engineer, I want to guide the agent toward specific areas so that I can focus testing on high-risk functionality.

US-004: As a QA engineer, I want a comprehensive report at the end so that I can prioritize and track discovered issues.

US-005: As a QA engineer, I want evidence (screenshots) for each finding so that I can reproduce and verify issues.

Secondary User: Development Team

US-006: As a developer, I want clear reproduction steps so that I can debug and fix issues efficiently.

US-007: As a developer, I want generated test scripts so that I can prevent regressions.

8. Risks and Mitigations

Risk	Impact	Likelihood	Mitigation
LLM rate limiting	High	Medium	Implement retry logic, caching, batch decisions
Target site changes	Medium	Low	Configurable selectors, adaptive discovery
Infinite exploration loops	High	Medium	Visit tracking, action limits, diversity scoring
LLM hallucination in actions	High	Medium	Validate actions against actual page state
Cost overrun (LLM tokens)	Medium	Medium	Token budgets, summarization strategies

9. Success Metrics

9.1 Quantitative

Bug Discovery Rate: ≥5 genuine issues found
False Positive Rate: <20% of reported issues
Coverage Breadth: ≥4 distinct functional areas explored
Session Efficiency: Complete meaningful exploration in <30 minutes

9.2 Qualitative

Report Actionability: Issues have clear reproduction steps
Code Quality: Passes code review standards
Architecture Clarity: Easy to understand and extend
Documentation Quality: README enables quick start

10. Timeline

Phase	Duration	Deliverables
Planning & Design	Day 1	PRD, ADRs, Architecture diagram
Core Implementation	Days 2-3	Agent, tools, CLI interface
Integration & Testing	Day 4	End-to-end testing, bug fixes
Documentation & Polish	Day 5	README, video, final report

Appendix A: Glossary

Agent: Autonomous software that perceives environment and takes actions
LLM: Large Language Model (e.g., Claude, GPT-4)
Human-in-the-Loop: Pattern where humans can intervene in automated processes
MCP: Model Context Protocol - standard for LLM tool integration
Exploration: Process of navigating and interacting with application to discover behavior

Appendix B: References

Target Application: https://with-bugs.practicesoftwaretesting.com
LangGraph Documentation: https://langchain-ai.github.io/langgraph/
Playwright Documentation: https://playwright.dev/
MCP Specification: https://modelcontextprotocol.io/

FilesExpand file tree

PRD.md

Latest commit

History

PRD.md

File metadata and controls

Product Requirements Document (PRD)

AI-Powered Exploratory Testing Agent

1. Executive Summary

1.1 Problem Statement

1.2 Solution Overview

1.3 Success Criteria

2. Product Scope

2.1 In Scope

2.2 Out of Scope

2.3 Target Application

3. Functional Requirements

3.1 Core Agent Capabilities

FR-001: Page Navigation

FR-002: UI Interaction

FR-003: Screenshot Capture

FR-004: Page Content Extraction

3.2 Intelligent Decision Making

FR-005: Page State Analysis

FR-006: Action Planning

FR-007: Issue Detection

3.3 Human-in-the-Loop Interface

FR-008: Progress Summary

FR-009: Checkpoint Triggers

FR-010: User Commands

3.4 Custom Tool: Broken Image Detector

FR-011: Image Detection

FR-012: Broken Image Analysis

FR-013: Structured Report Output

3.5 Findings Report

FR-014: Bug Report Generation

FR-015: Coverage Summary

FR-016: Report Format

4. Non-Functional Requirements

4.1 Performance

4.2 Reliability

4.3 Usability

4.4 Maintainability

5. Stretch Goals

5.1 Cloud Deployment (SG-001)

5.2 Persistence (SG-002)

5.3 Test Generation (SG-003)

5.4 MCP Server (SG-004)

6. Technical Constraints

6.1 Required Technologies

6.2 Technology Choices (To Be Decided)

7. User Stories

Primary User: QA Engineer

Secondary User: Development Team

8. Risks and Mitigations

9. Success Metrics

9.1 Quantitative

9.2 Qualitative

10. Timeline

Appendix A: Glossary

Appendix B: References