Skip to content

elmasoki/DBA

Repository files navigation

SQL Server Table Archiver - Best in Class

A production-ready, high-performance SQL Server table archiver designed for enterprise environments. This tool provides comprehensive features for safely archiving historical data while maintaining data integrity and minimizing impact on production systems.

🌟 Features

Core Capabilities

  • Batch Processing: Process large datasets in configurable batches to minimize table locking
  • Multiple Strategies: Date-based, count-based, custom query, and partition-based archiving
  • Transaction Safety: Full rollback support with configurable retry logic
  • Verification: Built-in archive verification to ensure data integrity
  • Performance Tracking: Comprehensive metrics and performance monitoring
  • Dry Run Mode: Test archiving operations without making changes

Advanced Features

  • Flexible Modes: Copy, move, or delete-only operations
  • Custom WHERE Clauses: Fine-grained control over what gets archived
  • Automatic Index Creation: Copy indexes from source to archive tables
  • Parallel Processing: Multi-threaded batch processing (experimental)
  • Comprehensive Logging: Detailed operation logs with configurable levels
  • Configuration Files: YAML/JSON support for easy configuration management

Production Ready

  • Error Handling: Robust error handling with automatic retries
  • Connection Management: Proper connection pooling and timeout handling
  • Performance Monitoring: Track throughput, batch times, and resource usage
  • Audit Trail: Complete audit log of all archiving operations
  • Safety Features: Dry-run mode, verification checks, and rollback support

πŸ“‹ Requirements

  • Python 3.8+
  • SQL Server 2016+ (or Azure SQL Database)
  • Required Python packages (see requirements.txt):
    • pyodbc - SQL Server connectivity
    • pyyaml - YAML configuration support

πŸš€ Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd DBA

# Install dependencies
pip install -r requirements.txt

2. Configure Connection

Create a connection string for your SQL Server:

# Windows Authentication
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=YourDB;Trusted_Connection=yes;"

# SQL Server Authentication
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=YourDB;UID=username;PWD=password;"

# Azure SQL Database
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=yourserver.database.windows.net;Database=YourDB;UID=username;PWD=password;Encrypt=yes;TrustServerCertificate=no;"

3. Create Configuration File

Copy and customize the example configuration:

cp config.example.yaml config.yaml

Edit config.yaml to define your tables and archiving rules:

tables:
  - source_table: Transactions
    archive_table: Transactions_Archive
    date_column: TransactionDate
    retention_days: 365
    batch_size: 5000
    strategy: date_based
    mode: move
    verify_archive: true

4. Run the Archiver

Using Python Script

# Dry run (no changes made)
python sql_table_archiver.py \
  --config config.yaml \
  --connection-string "your-connection-string" \
  --dry-run

# Actual archiving
python sql_table_archiver.py \
  --config config.yaml \
  --connection-string "your-connection-string"

Using Stored Procedures

-- Install stored procedures
USE YourDatabase
GO
-- Run the SQL script
:r sql_stored_procedures.sql
GO

-- Archive data using stored procedure
DECLARE @LogId BIGINT

EXEC dbo.sp_ArchiveTableByDate
    @SourceTable = 'Transactions',
    @ArchiveTable = 'Transactions_Archive',
    @DateColumn = 'TransactionDate',
    @RetentionDays = 365,
    @BatchSize = 5000,
    @DeleteAfterArchive = 1,
    @DryRun = 0,
    @LogId = @LogId OUTPUT

-- View statistics
EXEC dbo.sp_GetArchiveStatistics
    @TableName = 'dbo.Transactions'

πŸ“– Detailed Usage

Python API

Basic Usage

from sql_table_archiver import SQLServerTableArchiver, ArchiveConfig, ArchiveStrategy, ArchiveMode

# Create archiver instance
archiver = SQLServerTableArchiver(
    connection_string="your-connection-string",
    schema="dbo",
    dry_run=False
)

# Configure archiving
config = ArchiveConfig(
    source_table="Transactions",
    archive_table="Transactions_Archive",
    date_column="TransactionDate",
    retention_days=365,
    batch_size=5000,
    strategy=ArchiveStrategy.DATE_BASED,
    mode=ArchiveMode.MOVE,
    verify_archive=True
)

# Execute archiving
metrics = archiver.archive_table(config)

# View results
print(f"Rows archived: {metrics.rows_archived:,}")
print(f"Duration: {metrics.duration_seconds:.2f} seconds")
print(f"Throughput: {metrics.rows_per_second:.0f} rows/second")

Archive Multiple Tables

# Load from configuration file
configs = SQLServerTableArchiver.load_config_from_file('config.yaml')

# Archive all tables
results = archiver.archive_multiple_tables(configs)

# Print summary
for table, metrics in results.items():
    if metrics:
        print(f"{table}: {metrics.rows_archived:,} rows archived")
    else:
        print(f"{table}: FAILED")

Advanced Configuration

config = ArchiveConfig(
    source_table="Orders",
    archive_table="Orders_Archive",
    date_column="OrderDate",
    retention_days=730,
    batch_size=1000,
    max_rows=100000,  # Limit per run
    strategy=ArchiveStrategy.DATE_BASED,
    mode=ArchiveMode.MOVE,
    where_clause="Status = 'Completed'",  # Additional filter
    verify_archive=True,
    create_archive_table=True,
    copy_indexes=True,
    max_retries=3,
    timeout_seconds=300
)

Archiving Strategies

1. Date-Based Archiving

Archive data older than a specified retention period:

config = ArchiveConfig(
    source_table="Logs",
    date_column="LogDate",
    retention_days=90,  # Keep last 90 days
    strategy=ArchiveStrategy.DATE_BASED
)

2. Custom Query Archiving

Use custom WHERE clause for complex conditions:

config = ArchiveConfig(
    source_table="Sessions",
    strategy=ArchiveStrategy.CUSTOM_QUERY,
    where_clause="""
        SessionEndDate IS NOT NULL
        AND SessionEndDate < DATEADD(DAY, -30, GETDATE())
        AND SessionStatus = 'Completed'
    """
)

3. Count-Based Archiving (SQL Stored Procedure)

Archive oldest N rows:

EXEC dbo.sp_ArchiveTableByCount
    @SourceTable = 'AuditLog',
    @ArchiveTable = 'AuditLog_Archive',
    @OrderByColumn = 'CreatedDate',
    @MaxRowsToArchive = 1000000,
    @BatchSize = 10000,
    @DeleteAfterArchive = 1

Archive Modes

Move (Default)

Copy data to archive table and delete from source:

mode=ArchiveMode.MOVE

Copy Only

Copy data to archive table, keep in source:

mode=ArchiveMode.COPY

Delete Only

Delete data without archiving:

mode=ArchiveMode.DELETE_ONLY

πŸ”§ Configuration Reference

Table Configuration Options

Option Type Required Description
source_table string Yes Name of source table to archive
archive_table string No Name of archive table (default: source_table_Archive)
date_column string Conditional Date column for date-based archiving
retention_days integer Conditional Days of data to retain in source table
batch_size integer No Rows per batch (default: 1000)
max_rows integer No Maximum rows to archive per run
strategy enum No date_based, count_based, custom_query (default: date_based)
mode enum No move, copy, delete_only (default: move)
where_clause string No Additional WHERE conditions
verify_archive boolean No Verify archived data (default: true)
create_archive_table boolean No Auto-create archive table (default: true)
copy_indexes boolean No Copy indexes to archive table (default: true)
max_retries integer No Retry attempts on error (default: 3)
timeout_seconds integer No Operation timeout (default: 300)

πŸ“Š Monitoring and Metrics

View Archive Statistics

-- Get summary statistics
EXEC dbo.sp_GetArchiveStatistics

-- Statistics for specific table
EXEC dbo.sp_GetArchiveStatistics
    @TableName = 'dbo.Transactions'

-- Statistics for date range
EXEC dbo.sp_GetArchiveStatistics
    @StartDate = '2024-01-01',
    @EndDate = '2024-12-31'

Python Metrics

metrics = archiver.archive_table(config)

print(f"""
Archive Metrics:
  Table: {metrics.table_name}
  Rows Archived: {metrics.rows_archived:,}
  Rows Deleted: {metrics.rows_deleted:,}
  Batches: {metrics.batches_processed}
  Duration: {metrics.duration_seconds:.2f}s
  Avg Batch Time: {metrics.avg_batch_time:.4f}s
  Throughput: {metrics.rows_per_second:.0f} rows/sec
  Errors: {metrics.errors}
""")

Archive Log Tables

The system maintains comprehensive logs:

  • ArchiveLog: High-level operation logs
  • ArchiveMetrics: Detailed batch-level metrics
-- View recent archive operations
SELECT TOP 10
    TableName,
    ArchiveDate,
    RowsArchived,
    DurationSeconds,
    RowsArchived / NULLIF(DurationSeconds, 0) AS RowsPerSecond,
    Status
FROM dbo.ArchiveLog
ORDER BY ArchiveDate DESC

-- View batch performance
SELECT
    l.TableName,
    m.BatchNumber,
    m.RowsProcessed,
    m.DurationMs,
    m.RowsPerSecond
FROM dbo.ArchiveMetrics m
JOIN dbo.ArchiveLog l ON m.LogId = l.LogId
WHERE l.TableName = 'dbo.Transactions'
ORDER BY m.BatchStartTime DESC

πŸ”’ Best Practices

Performance Optimization

  1. Choose Appropriate Batch Sizes

    • Start with 1,000-5,000 rows per batch
    • Monitor lock duration and adjust accordingly
    • Larger batches = faster but more locking
    • Smaller batches = slower but less impact
  2. Schedule During Off-Peak Hours

    • Run archiving during maintenance windows
    • Avoid peak transaction times
    • Use SQL Server Agent for scheduling
  3. Index Strategy

    • Ensure date columns are indexed
    • Copy essential indexes to archive tables
    • Consider removing unused indexes after archiving
  4. Monitor Performance

    • Track rows per second throughput
    • Watch for blocking and deadlocks
    • Review batch timing metrics

Safety and Reliability

  1. Always Test First

    # Use dry-run mode
    archiver = SQLServerTableArchiver(
        connection_string=conn_str,
        dry_run=True  # No changes made
    )
  2. Start Small

    • Use max_rows to limit initial runs
    • Gradually increase batch sizes
    • Monitor system impact
  3. Enable Verification

    config.verify_archive = True  # Always verify
  4. Backup Before Archiving

    • Take database backups before large archives
    • Test restore procedures
    • Document archive operations
  5. Monitor Disk Space

    • Archive tables consume additional space
    • Plan for archive table growth
    • Implement archive table purging if needed

Maintenance

  1. Clean Up Old Logs

    -- Remove logs older than 90 days
    EXEC dbo.sp_CleanupOldArchiveLogs
        @RetentionDays = 90
  2. Archive Table Maintenance

    • Rebuild indexes periodically
    • Update statistics
    • Consider table partitioning for very large archives
  3. Regular Review

    • Review archiving effectiveness
    • Adjust retention periods as needed
    • Monitor archive table sizes

πŸ› Troubleshooting

Common Issues

Connection Timeouts

# Increase timeout
config.timeout_seconds = 600  # 10 minutes

Out of Transaction Log Space

  • Reduce batch size
  • Ensure transaction log is sized appropriately
  • Run archiving in smaller chunks

Slow Performance

  • Check for missing indexes on date columns
  • Increase batch size (if not causing locking)
  • Ensure statistics are up to date
  • Consider READPAST hint (already implemented)

Locking Issues

  • Reduce batch size
  • Schedule during off-peak hours
  • Use READPAST to skip locked rows

Error Recovery

The archiver automatically retries failed batches:

config.max_retries = 3  # Retry up to 3 times
config.retry_delay = 5  # Wait 5 seconds between retries

View errors in logs:

SELECT *
FROM dbo.ArchiveLog
WHERE Status = 'Failed'
ORDER BY ArchiveDate DESC

πŸ“… Scheduling

Windows Task Scheduler

Create a batch file (run_archiver.bat):

@echo off
cd /d "C:\Path\To\DBA"
python sql_table_archiver.py --config config.yaml --connection-string "your-conn-string" > logs\archive_%date:~-4,4%%date:~-10,2%%date:~-7,2%.log 2>&1

SQL Server Agent

Create a job step using stored procedure:

DECLARE @LogId BIGINT

EXEC dbo.sp_ArchiveTableByDate
    @SourceTable = 'Transactions',
    @ArchiveTable = 'Transactions_Archive',
    @DateColumn = 'TransactionDate',
    @RetentionDays = 365,
    @BatchSize = 5000,
    @LogId = @LogId OUTPUT

-- Check for errors
IF @LogId IS NULL
    THROW 50000, 'Archive operation failed', 1

Linux Cron

# Archive daily at 2 AM
0 2 * * * /usr/bin/python3 /path/to/sql_table_archiver.py --config /path/to/config.yaml --connection-string "your-conn-string" >> /var/log/archiver.log 2>&1

πŸ” Security Considerations

  1. Connection Strings: Store securely using environment variables or secret management
  2. Permissions: Grant minimum required permissions (SELECT, INSERT, DELETE on tables)
  3. Audit Logging: All operations are logged with timestamps and user context
  4. Encryption: Use encrypted connections (Encrypt=yes in connection string)

πŸ“ License

This project is provided as-is for use in SQL Server database management.

🀝 Contributing

Contributions are welcome! Please follow these guidelines:

  • Add tests for new features
  • Update documentation
  • Follow existing code style
  • Include examples for new functionality

πŸ“ž Support

For issues, questions, or contributions:

  • Check the troubleshooting section
  • Review existing documentation
  • Consult SQL Server best practices

🎯 Roadmap

Future enhancements:

  • Partition-based archiving strategy
  • Compression support for archive tables
  • Email notifications on completion/failure
  • Web-based monitoring dashboard
  • PostgreSQL and MySQL support
  • Advanced parallel processing
  • Automatic archive table partitioning
  • Data quality checks before archiving

Version: 1.0.0 Last Updated: December 2024 Author: SQL Server DBA Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published