SQL Server Table Archiver - Best in Class

A production-ready, high-performance SQL Server table archiver designed for enterprise environments. This tool provides comprehensive features for safely archiving historical data while maintaining data integrity and minimizing impact on production systems.

🌟 Features

Core Capabilities

Batch Processing: Process large datasets in configurable batches to minimize table locking
Multiple Strategies: Date-based, count-based, custom query, and partition-based archiving
Transaction Safety: Full rollback support with configurable retry logic
Verification: Built-in archive verification to ensure data integrity
Performance Tracking: Comprehensive metrics and performance monitoring
Dry Run Mode: Test archiving operations without making changes

Advanced Features

Flexible Modes: Copy, move, or delete-only operations
Custom WHERE Clauses: Fine-grained control over what gets archived
Automatic Index Creation: Copy indexes from source to archive tables
Parallel Processing: Multi-threaded batch processing (experimental)
Comprehensive Logging: Detailed operation logs with configurable levels
Configuration Files: YAML/JSON support for easy configuration management

Production Ready

Error Handling: Robust error handling with automatic retries
Connection Management: Proper connection pooling and timeout handling
Performance Monitoring: Track throughput, batch times, and resource usage
Audit Trail: Complete audit log of all archiving operations
Safety Features: Dry-run mode, verification checks, and rollback support

📋 Requirements

Python 3.8+
SQL Server 2016+ (or Azure SQL Database)
Required Python packages (see requirements.txt):
- pyodbc - SQL Server connectivity
- pyyaml - YAML configuration support

🚀 Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd DBA

# Install dependencies
pip install -r requirements.txt

2. Configure Connection

Create a connection string for your SQL Server:

# Windows Authentication
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=YourDB;Trusted_Connection=yes;"

# SQL Server Authentication
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=YourDB;UID=username;PWD=password;"

# Azure SQL Database
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=yourserver.database.windows.net;Database=YourDB;UID=username;PWD=password;Encrypt=yes;TrustServerCertificate=no;"

3. Create Configuration File

Copy and customize the example configuration:

cp config.example.yaml config.yaml

Edit config.yaml to define your tables and archiving rules:

tables:
  - source_table: Transactions
    archive_table: Transactions_Archive
    date_column: TransactionDate
    retention_days: 365
    batch_size: 5000
    strategy: date_based
    mode: move
    verify_archive: true

4. Run the Archiver

Using Python Script

# Dry run (no changes made)
python sql_table_archiver.py \
  --config config.yaml \
  --connection-string "your-connection-string" \
  --dry-run

# Actual archiving
python sql_table_archiver.py \
  --config config.yaml \
  --connection-string "your-connection-string"

Using Stored Procedures

-- Install stored procedures
USE YourDatabase
GO
-- Run the SQL script
:r sql_stored_procedures.sql
GO

-- Archive data using stored procedure
DECLARE @LogId BIGINT

EXEC dbo.sp_ArchiveTableByDate
    @SourceTable = 'Transactions',
    @ArchiveTable = 'Transactions_Archive',
    @DateColumn = 'TransactionDate',
    @RetentionDays = 365,
    @BatchSize = 5000,
    @DeleteAfterArchive = 1,
    @DryRun = 0,
    @LogId = @LogId OUTPUT

-- View statistics
EXEC dbo.sp_GetArchiveStatistics
    @TableName = 'dbo.Transactions'

📖 Detailed Usage

Python API

Basic Usage

from sql_table_archiver import SQLServerTableArchiver, ArchiveConfig, ArchiveStrategy, ArchiveMode

# Create archiver instance
archiver = SQLServerTableArchiver(
    connection_string="your-connection-string",
    schema="dbo",
    dry_run=False
)

# Configure archiving
config = ArchiveConfig(
    source_table="Transactions",
    archive_table="Transactions_Archive",
    date_column="TransactionDate",
    retention_days=365,
    batch_size=5000,
    strategy=ArchiveStrategy.DATE_BASED,
    mode=ArchiveMode.MOVE,
    verify_archive=True
)

# Execute archiving
metrics = archiver.archive_table(config)

# View results
print(f"Rows archived: {metrics.rows_archived:,}")
print(f"Duration: {metrics.duration_seconds:.2f} seconds")
print(f"Throughput: {metrics.rows_per_second:.0f} rows/second")

Archive Multiple Tables

# Load from configuration file
configs = SQLServerTableArchiver.load_config_from_file('config.yaml')

# Archive all tables
results = archiver.archive_multiple_tables(configs)

# Print summary
for table, metrics in results.items():
    if metrics:
        print(f"{table}: {metrics.rows_archived:,} rows archived")
    else:
        print(f"{table}: FAILED")

Advanced Configuration

config = ArchiveConfig(
    source_table="Orders",
    archive_table="Orders_Archive",
    date_column="OrderDate",
    retention_days=730,
    batch_size=1000,
    max_rows=100000,  # Limit per run
    strategy=ArchiveStrategy.DATE_BASED,
    mode=ArchiveMode.MOVE,
    where_clause="Status = 'Completed'",  # Additional filter
    verify_archive=True,
    create_archive_table=True,
    copy_indexes=True,
    max_retries=3,
    timeout_seconds=300
)

Archiving Strategies

1. Date-Based Archiving

Archive data older than a specified retention period:

config = ArchiveConfig(
    source_table="Logs",
    date_column="LogDate",
    retention_days=90,  # Keep last 90 days
    strategy=ArchiveStrategy.DATE_BASED
)

2. Custom Query Archiving

Use custom WHERE clause for complex conditions:

config = ArchiveConfig(
    source_table="Sessions",
    strategy=ArchiveStrategy.CUSTOM_QUERY,
    where_clause="""
        SessionEndDate IS NOT NULL
        AND SessionEndDate < DATEADD(DAY, -30, GETDATE())
        AND SessionStatus = 'Completed'
    """
)

3. Count-Based Archiving (SQL Stored Procedure)

Archive oldest N rows:

EXEC dbo.sp_ArchiveTableByCount
    @SourceTable = 'AuditLog',
    @ArchiveTable = 'AuditLog_Archive',
    @OrderByColumn = 'CreatedDate',
    @MaxRowsToArchive = 1000000,
    @BatchSize = 10000,
    @DeleteAfterArchive = 1

Archive Modes

Move (Default)

Copy data to archive table and delete from source:

mode=ArchiveMode.MOVE

Copy Only

Copy data to archive table, keep in source:

mode=ArchiveMode.COPY

Delete Only

Delete data without archiving:

mode=ArchiveMode.DELETE_ONLY

🔧 Configuration Reference

Table Configuration Options

Option	Type	Required	Description
`source_table`	string	Yes	Name of source table to archive
`archive_table`	string	No	Name of archive table (default: source_table_Archive)
`date_column`	string	Conditional	Date column for date-based archiving
`retention_days`	integer	Conditional	Days of data to retain in source table
`batch_size`	integer	No	Rows per batch (default: 1000)
`max_rows`	integer	No	Maximum rows to archive per run
`strategy`	enum	No	date_based, count_based, custom_query (default: date_based)
`mode`	enum	No	move, copy, delete_only (default: move)
`where_clause`	string	No	Additional WHERE conditions
`verify_archive`	boolean	No	Verify archived data (default: true)
`create_archive_table`	boolean	No	Auto-create archive table (default: true)
`copy_indexes`	boolean	No	Copy indexes to archive table (default: true)
`max_retries`	integer	No	Retry attempts on error (default: 3)
`timeout_seconds`	integer	No	Operation timeout (default: 300)

📊 Monitoring and Metrics

View Archive Statistics

-- Get summary statistics
EXEC dbo.sp_GetArchiveStatistics

-- Statistics for specific table
EXEC dbo.sp_GetArchiveStatistics
    @TableName = 'dbo.Transactions'

-- Statistics for date range
EXEC dbo.sp_GetArchiveStatistics
    @StartDate = '2024-01-01',
    @EndDate = '2024-12-31'

Python Metrics

metrics = archiver.archive_table(config)

print(f"""
Archive Metrics:
  Table: {metrics.table_name}
  Rows Archived: {metrics.rows_archived:,}
  Rows Deleted: {metrics.rows_deleted:,}
  Batches: {metrics.batches_processed}
  Duration: {metrics.duration_seconds:.2f}s
  Avg Batch Time: {metrics.avg_batch_time:.4f}s
  Throughput: {metrics.rows_per_second:.0f} rows/sec
  Errors: {metrics.errors}
""")

Archive Log Tables

The system maintains comprehensive logs:

ArchiveLog: High-level operation logs
ArchiveMetrics: Detailed batch-level metrics

-- View recent archive operations
SELECT TOP 10
    TableName,
    ArchiveDate,
    RowsArchived,
    DurationSeconds,
    RowsArchived / NULLIF(DurationSeconds, 0) AS RowsPerSecond,
    Status
FROM dbo.ArchiveLog
ORDER BY ArchiveDate DESC

-- View batch performance
SELECT
    l.TableName,
    m.BatchNumber,
    m.RowsProcessed,
    m.DurationMs,
    m.RowsPerSecond
FROM dbo.ArchiveMetrics m
JOIN dbo.ArchiveLog l ON m.LogId = l.LogId
WHERE l.TableName = 'dbo.Transactions'
ORDER BY m.BatchStartTime DESC

🔒 Best Practices

Performance Optimization

Choose Appropriate Batch Sizes
- Start with 1,000-5,000 rows per batch
- Monitor lock duration and adjust accordingly
- Larger batches = faster but more locking
- Smaller batches = slower but less impact
Schedule During Off-Peak Hours
- Run archiving during maintenance windows
- Avoid peak transaction times
- Use SQL Server Agent for scheduling
Index Strategy
- Ensure date columns are indexed
- Copy essential indexes to archive tables
- Consider removing unused indexes after archiving
Monitor Performance
- Track rows per second throughput
- Watch for blocking and deadlocks
- Review batch timing metrics

Safety and Reliability

Always Test First

# Use dry-run mode
archiver = SQLServerTableArchiver(
    connection_string=conn_str,
    dry_run=True  # No changes made
)

Start Small
- Use max_rows to limit initial runs
- Gradually increase batch sizes
- Monitor system impact

Enable Verification

config.verify_archive = True  # Always verify

Backup Before Archiving
- Take database backups before large archives
- Test restore procedures
- Document archive operations
Monitor Disk Space
- Archive tables consume additional space
- Plan for archive table growth
- Implement archive table purging if needed

Maintenance

Clean Up Old Logs

-- Remove logs older than 90 days
EXEC dbo.sp_CleanupOldArchiveLogs
    @RetentionDays = 90

Archive Table Maintenance
- Rebuild indexes periodically
- Update statistics
- Consider table partitioning for very large archives
Regular Review
- Review archiving effectiveness
- Adjust retention periods as needed
- Monitor archive table sizes

🐛 Troubleshooting

Common Issues

Connection Timeouts

# Increase timeout
config.timeout_seconds = 600  # 10 minutes

Out of Transaction Log Space

Reduce batch size
Ensure transaction log is sized appropriately
Run archiving in smaller chunks

Slow Performance

Check for missing indexes on date columns
Increase batch size (if not causing locking)
Ensure statistics are up to date
Consider READPAST hint (already implemented)

Locking Issues

Reduce batch size
Schedule during off-peak hours
Use READPAST to skip locked rows

Error Recovery

The archiver automatically retries failed batches:

config.max_retries = 3  # Retry up to 3 times
config.retry_delay = 5  # Wait 5 seconds between retries

View errors in logs:

SELECT *
FROM dbo.ArchiveLog
WHERE Status = 'Failed'
ORDER BY ArchiveDate DESC

📅 Scheduling

Windows Task Scheduler

Create a batch file (run_archiver.bat):

@echo off
cd /d "C:\Path\To\DBA"
python sql_table_archiver.py --config config.yaml --connection-string "your-conn-string" > logs\archive_%date:~-4,4%%date:~-10,2%%date:~-7,2%.log 2>&1

SQL Server Agent

Create a job step using stored procedure:

DECLARE @LogId BIGINT

EXEC dbo.sp_ArchiveTableByDate
    @SourceTable = 'Transactions',
    @ArchiveTable = 'Transactions_Archive',
    @DateColumn = 'TransactionDate',
    @RetentionDays = 365,
    @BatchSize = 5000,
    @LogId = @LogId OUTPUT

-- Check for errors
IF @LogId IS NULL
    THROW 50000, 'Archive operation failed', 1

Linux Cron

# Archive daily at 2 AM
0 2 * * * /usr/bin/python3 /path/to/sql_table_archiver.py --config /path/to/config.yaml --connection-string "your-conn-string" >> /var/log/archiver.log 2>&1

🔐 Security Considerations

Connection Strings: Store securely using environment variables or secret management
Permissions: Grant minimum required permissions (SELECT, INSERT, DELETE on tables)
Audit Logging: All operations are logged with timestamps and user context
Encryption: Use encrypted connections (Encrypt=yes in connection string)

📝 License

This project is provided as-is for use in SQL Server database management.

🤝 Contributing

Contributions are welcome! Please follow these guidelines:

Add tests for new features
Update documentation
Follow existing code style
Include examples for new functionality

📞 Support

For issues, questions, or contributions:

Check the troubleshooting section
Review existing documentation
Consult SQL Server best practices

🎯 Roadmap

Future enhancements:

Partition-based archiving strategy
Compression support for archive tables
Email notifications on completion/failure
Web-based monitoring dashboard
PostgreSQL and MySQL support
Advanced parallel processing
Automatic archive table partitioning
Data quality checks before archiving

Version: 1.0.0 Last Updated: December 2024 Author: SQL Server DBA Team

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
.gitignore		.gitignore
README.md		README.md
config.example.json		config.example.json
config.example.yaml		config.example.yaml
requirements.txt		requirements.txt
setup.py		setup.py
sql_stored_procedures.sql		sql_stored_procedures.sql
sql_table_archiver.py		sql_table_archiver.py

elmasoki/DBA

Folders and files

Latest commit

History

Repository files navigation