Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions docs/MILESTONE_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Iteration 2 Milestone Definition

## Milestone Name
**Enhanced User Interface & Data Infrastructure with Improved Classification Models**

---

## Description
In Iteration 2, the team will deliver a redesigned, user-friendly frontend interface, establish a robust database layer for persistent data storage, and continue advancing the binary classification models through data augmentation and training refinement. This iteration bridges the gap between backend model capability and production-ready user experience while strengthening the core ML pipeline.

---

## Success Criteria

1. **Frontend Redesign Completion**
- New frontend UI is deployed and accessible to all team members
- User interface is intuitive and meets accessibility standards
- All critical user workflows are functional (classification input, result visualization, history viewing)
- Performance baseline established (page load time < 2 seconds)

2. **Database Integration**
- Database schema designed and implemented
- Core entities (users, sequences, predictions, models) are persistently stored
- Data retrieval and update operations are functional and tested
- Database is integrated with the backend API

3. **Model Training Progress**
- Training pipeline executed with expanded or improved dataset
- Model performance metrics documented and compared against Iteration 1 baseline
- Training logs and artifacts are properly tracked
- At least one iteration of hyperparameter tuning or model improvement is complete

4. **System Integration**
- Frontend, backend API, and database successfully communicate end-to-end
- At least one complete user workflow (input sequence → predict → display result → store in database) functions successfully
- Integration tests validate the full pipeline

---

## Key Deliverables

### Frontend (Design & Implementation)
- [ ] Redesigned UI mockups/prototypes reviewed and approved
- [ ] React/Vite frontend application refactored with improved component architecture
- [ ] Input form for sequence classification with validation
- [ ] Results display panel with clear visualization of predictions
- [ ] Navigation and layout improvements for enhanced UX
- [ ] Responsive design for multiple screen sizes

### Database Layer
- [ ] Database schema documentation (ER diagram and table definitions)
- [ ] Database implementation (PostgreSQL/SQLite/MongoDB as appropriate)
- [ ] API endpoints for CRUD operations on key entities
- [ ] Data migration scripts (if needed)
- [ ] Basic data validation and constraints

### Model Training & Refinement
- [ ] Updated training dataset with quality improvements or augmentation
- [ ] Trained model(s) with documented performance metrics
- [ ] Model evaluation report (precision, recall, F1-score, ROC-AUC)
- [ ] Comparison analysis: Iteration 1 vs. Iteration 2 model performance
- [ ] Trained model weights saved and documented

### Documentation & Testing
- [ ] API documentation updated with database and frontend endpoints
- [ ] Deployment instructions for new infrastructure
- [ ] Integration test suite covering frontend-to-database workflows
- [ ] Team documentation on database schema and model training process

---

## Technical Scope

### Frontend Improvements
- Component refactoring and reusability
- State management optimization
- UI/UX enhancements based on usability feedback
- Error handling and user feedback mechanisms

### Database Implementation
- Schema design for sequence data, predictions, and metadata
- Connection pooling and query optimization
- Basic authentication/authorization framework
- Backup and recovery considerations

### Model Development
- Data preprocessing and quality assurance
- Model training with improved hyperparameters
- Cross-validation and test set evaluation
- Model versioning and tracking

---

## Dependencies & Risk Mitigation

**Potential Risks:**
- Frontend redesign scope creep → Prioritize MVP features; defer nice-to-have UI elements
- Database performance issues → Conduct load testing; optimize queries early
- Model training time constraints → Parallelize training; use pre-trained embeddings where applicable
- Integration complexity → Establish clear API contracts; use mock data for parallel development

---

## Timeline & Team Assignments
*To be filled by team lead with specific dates and owner assignments*

---

## Validation & Sign-off
- [ ] Frontend review completed by UX owner
- [ ] Database schema reviewed by backend lead
- [ ] Model metrics reviewed by ML team lead
- [ ] Integration testing passed by QA
- [ ] Milestone sign-off by project manager

---

## Success Metrics
- Zero critical bugs in core workflows
- Database latency < 200ms for standard queries
- Model F1-score improvement ≥ 5% over Iteration 1 baseline (or justified regression)
- Frontend accessibility score > 90
- Code coverage for new features ≥ 80%
53 changes: 53 additions & 0 deletions docs/issues/add-database-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# [Feature/Enhancement]: Add database-backed persistence for BAIO

## Summary
BAIO currently processes classification and chat requests in-memory and returns responses directly to the client, but it does not persist analysis runs, per-sequence results, or chat history. Adding a database layer would make the platform more reliable for reproducibility, auditing, and future product features like saved analyses and session history.

## Problem
- `POST /classify` generates a timestamped response but does not store the run or its results.
- `POST /chat` returns the latest reply without any durable conversation history.
- There is no database configuration, schema, or migration workflow in the backend.
- The current architecture cannot support saved reports, analysis history, comparisons across runs, or operational analytics.

## Proposed Scope
- Add a backend database integration with a local-development-friendly default and a production-ready path.
- Prefer a relational database design:
- SQLite for local development and CI
- PostgreSQL-ready configuration for deployed environments
- Introduce an ORM and migration workflow for schema management.
- Persist classification runs, including:
- request source
- model configuration used
- processing timestamp
- aggregate counts
- per-sequence classification results
- Persist AI assistant chat sessions/messages if chat history is intended to survive refreshes or restarts.
- Add environment-based database configuration and a health check that validates database connectivity.
- Update developer documentation for setup, migrations, and local usage.

## Suggested Technical Approach
- Backend: SQLAlchemy + Alembic
- Configuration: `DATABASE_URL` via environment variables
- Initial schema candidates:
- `analysis_runs`
- `sequence_results`
- `chat_sessions`
- `chat_messages`

## Acceptance Criteria
- A developer can run the backend with a configured `DATABASE_URL`.
- Initial database migrations are committed and reproducible.
- Classification runs can be saved and linked to their per-sequence results.
- The API can return a persisted run identifier for saved analyses.
- Database connectivity is covered by backend health checks or targeted tests.
- README or backend docs explain how to configure and run the database locally.

## Out of Scope
- Full authentication and user account management
- Multi-tenant authorization rules
- Historical backfill of prior ad hoc runs

## Why This Matters
- Improves reproducibility for bioinformatics workflows
- Enables saved analysis history and future reporting features
- Creates a stable base for collaboration, auditability, and production deployment
Loading