Welcome to SeekingData! This guide will help you get started with generating AI training data and Harbor evaluation tasks.
Download the latest version for your platform:
- macOS:
SeekingData-x.x.x.dmg - Windows:
SeekingData-Setup-x.x.x.exe - Linux:
SeekingData-x.x.x.AppImage
- Open the
.dmgfile - Drag SeekingData to Applications folder
- Open SeekingData from Applications
- If you see "Unidentified developer" warning:
- Go to System Preferences → Privacy & Security
- Click "Open Anyway" next to the security warning
- Run the
.exeinstaller - Follow the installation wizard
- Launch SeekingData from Start Menu
- Make the
.AppImagefile executable:chmod +x SeekingData-x.x.x.AppImage
- Run the application:
./SeekingData-x.x.x.AppImage
On first launch, you'll need to configure your API keys:
- Click Settings in the sidebar
- Enter your OpenAI API key
- (Optional) Enter your GitHub token for GitHub features
- Click Save
Generate training data from a single input:
- Click Single Processing in the sidebar
- Choose input method:
- File Upload: Upload PDF, DOCX, or TXT files
- URL: Enter a URL to extract content
- Text: Paste text directly
- Click Generate to create training data
- Review and edit the generated data
- Download in your preferred format
Process multiple inputs at once:
- Click Batch Processing in the sidebar
- Enter multiple URLs (one per line)
- Click Start Processing
- Monitor progress in real-time
- Review and download results
Convert between Alpaca and OpenAI formats:
- Click Format Conversion in the sidebar
- Upload your dataset file
- Select conversion direction:
- Alpaca → OpenAI
- OpenAI → Alpaca
- Preview the converted data
- Download the result
Generate Chain of Thought reasoning data:
- Click CoT Generator in the sidebar
- Enter a question that requires reasoning
- Click Generate CoT
- Review the step-by-step reasoning
- Edit if needed and download
Generate image description datasets:
- Click Image Dataset in the sidebar
- Upload images or provide image URLs
- Click Generate Descriptions
- Review and edit descriptions
- Download the dataset
Generate video understanding datasets:
- Click Video Dataset in the sidebar
- Upload videos or provide video URLs
- Click Generate Q&A
- Review and edit the Q&A pairs
- Download the dataset
Share datasets to HuggingFace:
- Click Dataset Share in the sidebar
- Select your prepared dataset
- Enter HuggingFace repository details
- Click Upload to HuggingFace
- Share the repository URL
Automatically generate Harbor tasks from GitHub:
- Click GitHub Generator in the sidebar
- Enter a GitHub repository URL
- Select an Issue or PR to generate a task from
- Click Generate Task
- Preview the generated task files
- Edit if needed
- Export the task to your local filesystem
Build Harbor tasks visually:
- Click Visual Builder in the sidebar
- Drag and drop task components:
- Docker environment
- Test cases
- Instructions
- Edit each component using Monaco Editor
- Preview the generated files in real-time
- Validate the task
- Export when ready
Manage your Harbor tasks:
- Click Task Manager in the sidebar
- View all tasks in list or card view
- Search and filter tasks
- Click a task to view details
- Edit, duplicate, or delete tasks
- Run validation tests
- Export tasks to share
- Quality over quantity: Focus on high-quality examples
- Diverse inputs: Use varied sources for better generalization
- Review outputs: Always review generated data before use
- Format consistency: Stick to one format for your dataset
- Clear instructions: Write detailed instruction.md files
- Comprehensive tests: Include multiple test cases
- Minimal environment: Keep Dockerfile minimal
- Validation: Always run validation before sharing
- Batch size: Limit batch processing to 10-20 items at a time
- API limits: Respect rate limits for external APIs
- Offline mode: The app works offline after initial setup
"Backend not responding"
- Restart the application
- Check if port 5001 is available
- Check firewall settings
"API key invalid"
- Verify your API key is correct
- Check if the key has necessary permissions
- Ensure the base URL is correct
"File upload failed"
- Check file size (max 10MB)
- Verify file format is supported
- Try a different file
"GitHub rate limit exceeded"
- Add a GitHub token in settings
- Wait for rate limit to reset (usually 1 hour)
View application logs:
- macOS:
~/Library/Logs/SeekingData/ - Windows:
%APPDATA%/SeekingData/logs/ - Linux:
~/.config/SeekingData/logs/
- Documentation: Check the Architecture Guide
- Issues: Report bugs on GitHub Issues
- Community: Join our Discord community
| Action | macOS | Windows/Linux |
|---|---|---|
| New Task | Cmd + N |
Ctrl + N |
| Save | Cmd + S |
Ctrl + S |
| Export | Cmd + E |
Ctrl + E |
| Settings | Cmd + , |
Ctrl + , |
| Toggle Sidebar | Cmd + B |
Ctrl + B |
SeekingData checks for updates automatically on launch.
- Go to Settings
- Click Check for Updates
- Download and install if available
- All data is stored locally on your device
- API keys are encrypted in local storage
- No data is sent to external servers (except API calls)
- OpenAI API key: Required for AI features
- GitHub token: Optional, for GitHub features
The app makes network requests to:
- OpenAI API (if configured)
- GitHub API (if GitHub features used)
- Custom base URLs (if configured)
- Quit SeekingData
- Drag from Applications to Trash
- Remove data (optional):
rm -rf ~/Library/Application\ Support/SeekingData rm -rf ~/Library/Logs/SeekingData
- Uninstall from Control Panel
- Remove data (optional):
%APPDATA%/SeekingData
- Remove the
.AppImagefile - Remove data (optional):
rm -rf ~/.config/SeekingData