Skip to content

Latest commit

 

History

History
406 lines (306 loc) · 13.4 KB

File metadata and controls

406 lines (306 loc) · 13.4 KB

Product Requirements Document (PRD)

AI-Powered Exploratory Testing Agent

Version: 1.0
Author: Joaquim
Date: January 2025
Status: Draft


1. Executive Summary

1.1 Problem Statement

Manual exploratory testing is time-consuming, inconsistent, and fails to scale. QA teams spend significant effort navigating applications, identifying edge cases, and documenting findings—work that could be augmented by intelligent automation.

1.2 Solution Overview

An autonomous AI agent that explores web applications intelligently, discovers potential issues, and generates actionable reports. The agent combines LLM reasoning with browser automation to make human-like testing decisions while maintaining human oversight through a CLI interface.

1.3 Success Criteria

Metric Target
Bug Discovery Identify ≥5 distinct issues in target application
Coverage Explore ≥4 major functional areas
Report Quality Actionable findings with evidence (screenshots)
Human Control Clear intervention points with meaningful summaries

2. Product Scope

2.1 In Scope

  • Autonomous Web Exploration: Navigate pages, interact with UI elements, fill forms
  • Intelligent Decision Making: LLM-driven analysis of page state and next actions
  • Human-in-the-Loop CLI: Periodic checkpoints for user guidance
  • Custom Tooling: Broken image detector with comprehensive edge case handling
  • Structured Reporting: Findings report with severity, evidence, and coverage summary
  • Stretch Goals: Cloud deployment, persistence, test generation, MCP server

2.2 Out of Scope

  • Visual regression testing (pixel comparison)
  • Performance/load testing
  • Security penetration testing (beyond basic input validation)
  • Mobile-specific testing
  • Multi-browser support (Chromium only for MVP)

2.3 Target Application

URL: https://with-bugs.practicesoftwaretesting.com

Functional Areas to Explore:

  1. Product browsing and search
  2. Product details and filtering
  3. Shopping cart operations
  4. Checkout flow
  5. User authentication (login/register)
  6. User profile management
  7. Contact/support forms

3. Functional Requirements

3.1 Core Agent Capabilities

FR-001: Page Navigation

  • Description: Agent navigates between pages using links, buttons, and URL manipulation
  • Acceptance Criteria:
    • Successfully follow internal links
    • Handle navigation failures gracefully
    • Track visited pages to avoid infinite loops
    • Support back/forward navigation when beneficial

FR-002: UI Interaction

  • Description: Agent interacts with all common UI elements
  • Acceptance Criteria:
    • Click buttons, links, and interactive elements
    • Fill text inputs, textareas, and rich text editors
    • Select options from dropdowns and radio buttons
    • Handle checkboxes and toggles
    • Submit forms
    • Handle modal dialogs and popups

FR-003: Screenshot Capture

  • Description: Agent captures visual evidence of findings
  • Acceptance Criteria:
    • Full page screenshots on significant events
    • Element-specific screenshots for issue evidence
    • Organized storage with meaningful naming
    • Linked to findings in final report

FR-004: Page Content Extraction

  • Description: Agent extracts and analyzes page content for LLM processing
  • Acceptance Criteria:
    • Extract visible text content
    • Identify interactive elements with their attributes
    • Capture form field states and validation messages
    • Detect error messages and alerts
    • Extract structured data (tables, lists, product info)

3.2 Intelligent Decision Making

FR-005: Page State Analysis

  • Description: LLM analyzes current page to understand context
  • Acceptance Criteria:
    • Identify page type/purpose
    • Recognize available actions
    • Detect potential issues or anomalies
    • Understand application state (logged in, cart contents, etc.)

FR-006: Action Planning

  • Description: LLM decides next action based on analysis
  • Acceptance Criteria:
    • Generate hypothesis about what to test
    • Prioritize high-value interactions
    • Provide clear reasoning for decisions
    • Balance exploration vs exploitation
    • Consider testing edge cases (empty inputs, special characters, boundaries)

FR-007: Issue Detection

  • Description: Agent identifies potential bugs and UX issues
  • Acceptance Criteria:
    • Detect JavaScript errors in console
    • Identify HTTP errors (4xx, 5xx)
    • Recognize broken functionality
    • Note UX issues (confusing flows, missing feedback)
    • Identify accessibility concerns
    • Detect data inconsistencies

3.3 Human-in-the-Loop Interface

FR-008: Progress Summary

  • Description: Agent presents exploration progress to user
  • Acceptance Criteria:
    • Pages visited count and list
    • Actions performed summary
    • Issues found so far
    • Current location in application
    • Time elapsed

FR-009: Checkpoint Triggers

  • Description: Agent pauses for human input at defined points
  • Acceptance Criteria:
    • After N actions (configurable, default: 10)
    • When entering new major section
    • When confidence in next action is low
    • After finding significant issue
    • When stuck or detecting potential infinite loop

FR-010: User Commands

  • Description: User can guide agent behavior
  • Acceptance Criteria:
    • Continue: Proceed with agent's plan
    • Stop: End exploration and generate report
    • Guide: Provide specific direction (e.g., "focus on checkout")
    • Skip: Avoid certain areas
    • Prioritize: Focus on specific functionality

3.4 Custom Tool: Broken Image Detector

FR-011: Image Detection

  • Description: Scan page for all image elements
  • Acceptance Criteria:
    • Find all <img> elements
    • Find CSS background images
    • Find images in <picture> elements
    • Find SVG images (inline and external)
    • Find favicon and other meta images

FR-012: Broken Image Analysis

  • Description: Determine if images are broken
  • Acceptance Criteria:
    • Detect HTTP errors (404, 403, 500, etc.)
    • Detect network failures
    • Detect empty/invalid src attributes
    • Detect zero-dimension images (naturalWidth/naturalHeight = 0)
    • Detect images that timeout
    • Handle lazy-loaded images appropriately

FR-013: Structured Report Output

  • Description: Return detailed broken image information
  • Acceptance Criteria:
    • Image source URL
    • Alt text (if present)
    • Location on page (selector/xpath)
    • Failure reason (categorized)
    • Parent element context
    • Severity assessment

3.5 Findings Report

FR-014: Bug Report Generation

  • Description: Generate comprehensive bug report
  • Acceptance Criteria:
    • List all discovered issues
    • Severity classification (Critical, High, Medium, Low)
    • Steps to reproduce
    • Expected vs actual behavior
    • Screenshot evidence
    • Timestamp and page URL

FR-015: Coverage Summary

  • Description: Document exploration coverage
  • Acceptance Criteria:
    • Pages visited with timestamps
    • Functional areas covered
    • Actions performed per area
    • Unexplored areas identified
    • Session duration and statistics

FR-016: Report Format

  • Description: Output format and structure
  • Acceptance Criteria:
    • Markdown format for readability
    • JSON format for programmatic access
    • HTML format for visual presentation (optional)
    • Screenshots embedded or linked
    • Exportable and shareable

4. Non-Functional Requirements

4.1 Performance

Requirement Target
Action execution time < 5 seconds per action
LLM response time < 10 seconds per decision
Total exploration session 15-30 minutes typical
Memory usage < 2GB RAM

4.2 Reliability

  • Error Recovery: Agent recovers from transient failures (network, element not found)
  • State Consistency: Agent maintains consistent state across actions
  • Graceful Degradation: Agent completes partial report if terminated early

4.3 Usability

  • Setup Time: < 10 minutes from clone to running
  • Configuration: Sensible defaults with optional overrides
  • Documentation: Clear README with examples
  • Output Clarity: Reports understandable by non-technical stakeholders

4.4 Maintainability

  • Code Quality: TypeScript strict mode, ESLint, Prettier
  • Modularity: Clear separation of concerns
  • Extensibility: Easy to add new tools, detectors, or LLM providers
  • Testing: Unit tests for critical components

5. Stretch Goals

5.1 Cloud Deployment (SG-001)

Priority: High
Description: Deploy agent to cloud for remote execution

Requirements:

  • Containerized deployment (Docker)
  • Trigger via HTTP endpoint or CLI
  • Support for scheduled runs
  • Results stored and retrievable
  • Cost-effective infrastructure

Suggested Platforms: Railway, Render, AWS Lambda + Fargate

5.2 Persistence (SG-002)

Priority: Medium
Description: Save and resume exploration state

Requirements:

  • Serialize exploration state (visited pages, findings, context)
  • Resume from checkpoint
  • Merge findings from multiple sessions
  • Handle application state changes between sessions

5.3 Test Generation (SG-003)

Priority: Medium
Description: Generate Playwright test scripts from findings

Requirements:

  • Convert issue reproduction steps to Playwright code
  • Generate regression tests for discovered bugs
  • Include assertions based on expected behavior
  • Organize tests by feature area

5.4 MCP Server (SG-004)

Priority: High (aligns with current expertise)
Description: Expose agent as MCP server

Requirements:

  • Implement MCP protocol
  • Expose tools: start_exploration, get_status, get_findings, stop_exploration
  • Allow external LLMs to invoke agent
  • Support streaming progress updates

6. Technical Constraints

6.1 Required Technologies

Component Technology Rationale
Language TypeScript Required by challenge
Browser Automation Playwright Required by challenge
Runtime Node.js 20+ LTS with modern features

6.2 Technology Choices (To Be Decided)

Component Options Decision Criteria
Agent Framework LangGraph vs Custom See ADR-001
LLM Provider Anthropic vs OpenAI See ADR-002
State Management In-memory vs Persistent See ADR-003

7. User Stories

Primary User: QA Engineer

US-001: As a QA engineer, I want to start the agent with a single command so that I can quickly begin exploration without complex setup.

US-002: As a QA engineer, I want to see what the agent is doing in real-time so that I can understand its reasoning and catch issues early.

US-003: As a QA engineer, I want to guide the agent toward specific areas so that I can focus testing on high-risk functionality.

US-004: As a QA engineer, I want a comprehensive report at the end so that I can prioritize and track discovered issues.

US-005: As a QA engineer, I want evidence (screenshots) for each finding so that I can reproduce and verify issues.

Secondary User: Development Team

US-006: As a developer, I want clear reproduction steps so that I can debug and fix issues efficiently.

US-007: As a developer, I want generated test scripts so that I can prevent regressions.


8. Risks and Mitigations

Risk Impact Likelihood Mitigation
LLM rate limiting High Medium Implement retry logic, caching, batch decisions
Target site changes Medium Low Configurable selectors, adaptive discovery
Infinite exploration loops High Medium Visit tracking, action limits, diversity scoring
LLM hallucination in actions High Medium Validate actions against actual page state
Cost overrun (LLM tokens) Medium Medium Token budgets, summarization strategies

9. Success Metrics

9.1 Quantitative

  • Bug Discovery Rate: ≥5 genuine issues found
  • False Positive Rate: <20% of reported issues
  • Coverage Breadth: ≥4 distinct functional areas explored
  • Session Efficiency: Complete meaningful exploration in <30 minutes

9.2 Qualitative

  • Report Actionability: Issues have clear reproduction steps
  • Code Quality: Passes code review standards
  • Architecture Clarity: Easy to understand and extend
  • Documentation Quality: README enables quick start

10. Timeline

Phase Duration Deliverables
Planning & Design Day 1 PRD, ADRs, Architecture diagram
Core Implementation Days 2-3 Agent, tools, CLI interface
Integration & Testing Day 4 End-to-end testing, bug fixes
Documentation & Polish Day 5 README, video, final report

Appendix A: Glossary

  • Agent: Autonomous software that perceives environment and takes actions
  • LLM: Large Language Model (e.g., Claude, GPT-4)
  • Human-in-the-Loop: Pattern where humans can intervene in automated processes
  • MCP: Model Context Protocol - standard for LLM tool integration
  • Exploration: Process of navigating and interacting with application to discover behavior

Appendix B: References