A production-ready, high-performance SQL Server table archiver designed for enterprise environments. This tool provides comprehensive features for safely archiving historical data while maintaining data integrity and minimizing impact on production systems.
- Batch Processing: Process large datasets in configurable batches to minimize table locking
- Multiple Strategies: Date-based, count-based, custom query, and partition-based archiving
- Transaction Safety: Full rollback support with configurable retry logic
- Verification: Built-in archive verification to ensure data integrity
- Performance Tracking: Comprehensive metrics and performance monitoring
- Dry Run Mode: Test archiving operations without making changes
- Flexible Modes: Copy, move, or delete-only operations
- Custom WHERE Clauses: Fine-grained control over what gets archived
- Automatic Index Creation: Copy indexes from source to archive tables
- Parallel Processing: Multi-threaded batch processing (experimental)
- Comprehensive Logging: Detailed operation logs with configurable levels
- Configuration Files: YAML/JSON support for easy configuration management
- Error Handling: Robust error handling with automatic retries
- Connection Management: Proper connection pooling and timeout handling
- Performance Monitoring: Track throughput, batch times, and resource usage
- Audit Trail: Complete audit log of all archiving operations
- Safety Features: Dry-run mode, verification checks, and rollback support
- Python 3.8+
- SQL Server 2016+ (or Azure SQL Database)
- Required Python packages (see requirements.txt):
pyodbc- SQL Server connectivitypyyaml- YAML configuration support
# Clone the repository
git clone <repository-url>
cd DBA
# Install dependencies
pip install -r requirements.txtCreate a connection string for your SQL Server:
# Windows Authentication
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=YourDB;Trusted_Connection=yes;"
# SQL Server Authentication
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=YourDB;UID=username;PWD=password;"
# Azure SQL Database
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=yourserver.database.windows.net;Database=YourDB;UID=username;PWD=password;Encrypt=yes;TrustServerCertificate=no;"Copy and customize the example configuration:
cp config.example.yaml config.yamlEdit config.yaml to define your tables and archiving rules:
tables:
- source_table: Transactions
archive_table: Transactions_Archive
date_column: TransactionDate
retention_days: 365
batch_size: 5000
strategy: date_based
mode: move
verify_archive: true# Dry run (no changes made)
python sql_table_archiver.py \
--config config.yaml \
--connection-string "your-connection-string" \
--dry-run
# Actual archiving
python sql_table_archiver.py \
--config config.yaml \
--connection-string "your-connection-string"-- Install stored procedures
USE YourDatabase
GO
-- Run the SQL script
:r sql_stored_procedures.sql
GO
-- Archive data using stored procedure
DECLARE @LogId BIGINT
EXEC dbo.sp_ArchiveTableByDate
@SourceTable = 'Transactions',
@ArchiveTable = 'Transactions_Archive',
@DateColumn = 'TransactionDate',
@RetentionDays = 365,
@BatchSize = 5000,
@DeleteAfterArchive = 1,
@DryRun = 0,
@LogId = @LogId OUTPUT
-- View statistics
EXEC dbo.sp_GetArchiveStatistics
@TableName = 'dbo.Transactions'from sql_table_archiver import SQLServerTableArchiver, ArchiveConfig, ArchiveStrategy, ArchiveMode
# Create archiver instance
archiver = SQLServerTableArchiver(
connection_string="your-connection-string",
schema="dbo",
dry_run=False
)
# Configure archiving
config = ArchiveConfig(
source_table="Transactions",
archive_table="Transactions_Archive",
date_column="TransactionDate",
retention_days=365,
batch_size=5000,
strategy=ArchiveStrategy.DATE_BASED,
mode=ArchiveMode.MOVE,
verify_archive=True
)
# Execute archiving
metrics = archiver.archive_table(config)
# View results
print(f"Rows archived: {metrics.rows_archived:,}")
print(f"Duration: {metrics.duration_seconds:.2f} seconds")
print(f"Throughput: {metrics.rows_per_second:.0f} rows/second")# Load from configuration file
configs = SQLServerTableArchiver.load_config_from_file('config.yaml')
# Archive all tables
results = archiver.archive_multiple_tables(configs)
# Print summary
for table, metrics in results.items():
if metrics:
print(f"{table}: {metrics.rows_archived:,} rows archived")
else:
print(f"{table}: FAILED")config = ArchiveConfig(
source_table="Orders",
archive_table="Orders_Archive",
date_column="OrderDate",
retention_days=730,
batch_size=1000,
max_rows=100000, # Limit per run
strategy=ArchiveStrategy.DATE_BASED,
mode=ArchiveMode.MOVE,
where_clause="Status = 'Completed'", # Additional filter
verify_archive=True,
create_archive_table=True,
copy_indexes=True,
max_retries=3,
timeout_seconds=300
)Archive data older than a specified retention period:
config = ArchiveConfig(
source_table="Logs",
date_column="LogDate",
retention_days=90, # Keep last 90 days
strategy=ArchiveStrategy.DATE_BASED
)Use custom WHERE clause for complex conditions:
config = ArchiveConfig(
source_table="Sessions",
strategy=ArchiveStrategy.CUSTOM_QUERY,
where_clause="""
SessionEndDate IS NOT NULL
AND SessionEndDate < DATEADD(DAY, -30, GETDATE())
AND SessionStatus = 'Completed'
"""
)Archive oldest N rows:
EXEC dbo.sp_ArchiveTableByCount
@SourceTable = 'AuditLog',
@ArchiveTable = 'AuditLog_Archive',
@OrderByColumn = 'CreatedDate',
@MaxRowsToArchive = 1000000,
@BatchSize = 10000,
@DeleteAfterArchive = 1Copy data to archive table and delete from source:
mode=ArchiveMode.MOVECopy data to archive table, keep in source:
mode=ArchiveMode.COPYDelete data without archiving:
mode=ArchiveMode.DELETE_ONLY| Option | Type | Required | Description |
|---|---|---|---|
source_table |
string | Yes | Name of source table to archive |
archive_table |
string | No | Name of archive table (default: source_table_Archive) |
date_column |
string | Conditional | Date column for date-based archiving |
retention_days |
integer | Conditional | Days of data to retain in source table |
batch_size |
integer | No | Rows per batch (default: 1000) |
max_rows |
integer | No | Maximum rows to archive per run |
strategy |
enum | No | date_based, count_based, custom_query (default: date_based) |
mode |
enum | No | move, copy, delete_only (default: move) |
where_clause |
string | No | Additional WHERE conditions |
verify_archive |
boolean | No | Verify archived data (default: true) |
create_archive_table |
boolean | No | Auto-create archive table (default: true) |
copy_indexes |
boolean | No | Copy indexes to archive table (default: true) |
max_retries |
integer | No | Retry attempts on error (default: 3) |
timeout_seconds |
integer | No | Operation timeout (default: 300) |
-- Get summary statistics
EXEC dbo.sp_GetArchiveStatistics
-- Statistics for specific table
EXEC dbo.sp_GetArchiveStatistics
@TableName = 'dbo.Transactions'
-- Statistics for date range
EXEC dbo.sp_GetArchiveStatistics
@StartDate = '2024-01-01',
@EndDate = '2024-12-31'metrics = archiver.archive_table(config)
print(f"""
Archive Metrics:
Table: {metrics.table_name}
Rows Archived: {metrics.rows_archived:,}
Rows Deleted: {metrics.rows_deleted:,}
Batches: {metrics.batches_processed}
Duration: {metrics.duration_seconds:.2f}s
Avg Batch Time: {metrics.avg_batch_time:.4f}s
Throughput: {metrics.rows_per_second:.0f} rows/sec
Errors: {metrics.errors}
""")The system maintains comprehensive logs:
- ArchiveLog: High-level operation logs
- ArchiveMetrics: Detailed batch-level metrics
-- View recent archive operations
SELECT TOP 10
TableName,
ArchiveDate,
RowsArchived,
DurationSeconds,
RowsArchived / NULLIF(DurationSeconds, 0) AS RowsPerSecond,
Status
FROM dbo.ArchiveLog
ORDER BY ArchiveDate DESC
-- View batch performance
SELECT
l.TableName,
m.BatchNumber,
m.RowsProcessed,
m.DurationMs,
m.RowsPerSecond
FROM dbo.ArchiveMetrics m
JOIN dbo.ArchiveLog l ON m.LogId = l.LogId
WHERE l.TableName = 'dbo.Transactions'
ORDER BY m.BatchStartTime DESC-
Choose Appropriate Batch Sizes
- Start with 1,000-5,000 rows per batch
- Monitor lock duration and adjust accordingly
- Larger batches = faster but more locking
- Smaller batches = slower but less impact
-
Schedule During Off-Peak Hours
- Run archiving during maintenance windows
- Avoid peak transaction times
- Use SQL Server Agent for scheduling
-
Index Strategy
- Ensure date columns are indexed
- Copy essential indexes to archive tables
- Consider removing unused indexes after archiving
-
Monitor Performance
- Track rows per second throughput
- Watch for blocking and deadlocks
- Review batch timing metrics
-
Always Test First
# Use dry-run mode archiver = SQLServerTableArchiver( connection_string=conn_str, dry_run=True # No changes made )
-
Start Small
- Use
max_rowsto limit initial runs - Gradually increase batch sizes
- Monitor system impact
- Use
-
Enable Verification
config.verify_archive = True # Always verify
-
Backup Before Archiving
- Take database backups before large archives
- Test restore procedures
- Document archive operations
-
Monitor Disk Space
- Archive tables consume additional space
- Plan for archive table growth
- Implement archive table purging if needed
-
Clean Up Old Logs
-- Remove logs older than 90 days EXEC dbo.sp_CleanupOldArchiveLogs @RetentionDays = 90
-
Archive Table Maintenance
- Rebuild indexes periodically
- Update statistics
- Consider table partitioning for very large archives
-
Regular Review
- Review archiving effectiveness
- Adjust retention periods as needed
- Monitor archive table sizes
# Increase timeout
config.timeout_seconds = 600 # 10 minutes- Reduce batch size
- Ensure transaction log is sized appropriately
- Run archiving in smaller chunks
- Check for missing indexes on date columns
- Increase batch size (if not causing locking)
- Ensure statistics are up to date
- Consider READPAST hint (already implemented)
- Reduce batch size
- Schedule during off-peak hours
- Use READPAST to skip locked rows
The archiver automatically retries failed batches:
config.max_retries = 3 # Retry up to 3 times
config.retry_delay = 5 # Wait 5 seconds between retriesView errors in logs:
SELECT *
FROM dbo.ArchiveLog
WHERE Status = 'Failed'
ORDER BY ArchiveDate DESCCreate a batch file (run_archiver.bat):
@echo off
cd /d "C:\Path\To\DBA"
python sql_table_archiver.py --config config.yaml --connection-string "your-conn-string" > logs\archive_%date:~-4,4%%date:~-10,2%%date:~-7,2%.log 2>&1Create a job step using stored procedure:
DECLARE @LogId BIGINT
EXEC dbo.sp_ArchiveTableByDate
@SourceTable = 'Transactions',
@ArchiveTable = 'Transactions_Archive',
@DateColumn = 'TransactionDate',
@RetentionDays = 365,
@BatchSize = 5000,
@LogId = @LogId OUTPUT
-- Check for errors
IF @LogId IS NULL
THROW 50000, 'Archive operation failed', 1# Archive daily at 2 AM
0 2 * * * /usr/bin/python3 /path/to/sql_table_archiver.py --config /path/to/config.yaml --connection-string "your-conn-string" >> /var/log/archiver.log 2>&1- Connection Strings: Store securely using environment variables or secret management
- Permissions: Grant minimum required permissions (SELECT, INSERT, DELETE on tables)
- Audit Logging: All operations are logged with timestamps and user context
- Encryption: Use encrypted connections (Encrypt=yes in connection string)
This project is provided as-is for use in SQL Server database management.
Contributions are welcome! Please follow these guidelines:
- Add tests for new features
- Update documentation
- Follow existing code style
- Include examples for new functionality
For issues, questions, or contributions:
- Check the troubleshooting section
- Review existing documentation
- Consult SQL Server best practices
Future enhancements:
- Partition-based archiving strategy
- Compression support for archive tables
- Email notifications on completion/failure
- Web-based monitoring dashboard
- PostgreSQL and MySQL support
- Advanced parallel processing
- Automatic archive table partitioning
- Data quality checks before archiving
Version: 1.0.0 Last Updated: December 2024 Author: SQL Server DBA Team