Skip to content

nklc/dbx-py-env-sync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Databricks Python Environment Setup

Test Makefile Validate Pull Request

A robust, automated makefile-based tool for creating and managing local Python environments that match specific Databricks Serverless Environment Versions.

πŸš€ Quick Start

# Install prerequisites
# Install uv: https://github.com/astral-sh/uv

# Create environment for Databricks version 4
make env ENV_VER=4

# Activate the environment
source .venv-db4/bin/activate

That's it! You now have a local Python environment matching Databricks Environment Version 4.

✨ Features

  • Automatic Python Version Detection: Dynamically fetches the correct Python version from Databricks documentation
  • Smart Package Management:
    • Removes Ubuntu-specific/system packages that won't work on macOS
    • Handles binary-only packages gracefully (installs if available, skips if not)
    • Cleans Databricks-specific version suffixes from packages
  • Version Validation: Only allows creation of environments for valid Databricks versions
  • Modular Pipeline: Separate targets for each step (requirements, Python install, venv setup, dependencies)
  • Lock File Generation: Creates requirements-env-X.lock for reproducible environments
  • Clean Management: Easy cleanup for specific versions or all environments
  • Comprehensive Testing: Built-in test suite to validate functionality

πŸ“‹ Prerequisites

  • uv - Fast Python package installer and environment manager
    # Install uv (macOS/Linux)
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Add to PATH
    export PATH="$HOME/.local/bin:$PATH"
  • make - Usually pre-installed on macOS/Linux
  • curl - For downloading requirements files
  • Internet connection - For fetching Databricks documentation and packages

πŸ“– Usage

Main Commands

# Show all available commands
make help

# List available Databricks environment versions
make list-versions

# Create complete environment (default: version 4)
make env ENV_VER=4

# Clean up specific version
make clean ENV_VER=4

# Clean up all environments
make clean-all

Incremental Commands

For more control over the process:

# 1. Download and process requirements
make requirements ENV_VER=4

# 2. Detect and install Python version
make python-version ENV_VER=4
make install-python ENV_VER=4

# 3. Create virtual environment
make setup-venv ENV_VER=4

# 4. Install dependencies
make install-deps ENV_VER=4

# 5. Generate lock file
make create-lockfile ENV_VER=4

Using the Environment

Once created, activate and use your environment:

# Activate
source .venv-db4/bin/activate

# Verify Python version
python --version

# Test imports
python -c "import pandas, numpy, pyspark; print('All imports successful!')"

# Deactivate
deactivate

πŸ—‚οΈ Generated Files

When you run make env ENV_VER=4, the following files are created:

.venv-db4/                      # Virtual environment directory
requirements-env-4.txt          # Processed requirements file
requirements-env-4.txt.binary   # Binary-only packages (internal use)
requirements-env-4.lock         # Lock file with installed packages

πŸ”§ Configuration

Default Environment Version

Change the default version by modifying ENV_VER in the makefile:

ENV_VER ?= 4  # Change to your preferred default

Excluded Packages

The makefile automatically excludes packages that won't work on macOS. To modify the list, edit:

EXCLUDED_PACKAGES = unattended-upgrades|ssh-import-id|...

Binary-Only Packages

Packages that require binary wheels (will be skipped if unavailable):

BINARY_ONLY_PACKAGES = pyodbc

🎯 Available Databricks Versions

Currently supported versions (automatically fetched from Microsoft documentation):

  • Version 1
  • Version 2
  • Version 3
  • Version 4

Run make list-versions to see the latest available versions.

πŸ“¦ Package Handling

Automatically Excluded Packages

The following packages are removed because they're Ubuntu/system-specific or lack ARM64 macOS wheels:

  • unattended-upgrades - Ubuntu system package
  • ssh-import-id - Ubuntu utility
  • dbus-python - Requires D-Bus system library
  • psycopg2 - PostgreSQL library requiring system dependencies
  • psutil - No compatible wheels for some versions
  • PyGObject, pycairo - GTK bindings
  • wadllib, lazr.uri, lazr.restfulclient - Launchpad utilities
  • google-api-core - Compatibility issues

Binary-Only Packages

Packages attempted with --only-binary (skipped if no wheel available):

  • pyodbc - ODBC database connector

Version Cleaning

Databricks-specific version suffixes are automatically removed:

pyspark==4.0.0+databricks.connect.17.0.1  β†’  pyspark==4.0.0

πŸ§ͺ Testing

Quick Start

Run the full test suite:

make test

Or run the test script directly:

./test_makefile.sh

For quick validation (30 seconds):

./quick_test.sh

What Gets Tested

1. Prerequisites Checks

  • βœ“ uv is installed and accessible
  • βœ“ make is installed
  • βœ“ makefile exists

2. Information Targets

  • βœ“ make help displays correctly
  • βœ“ make list-versions returns available versions

3. Validation

  • βœ“ make validate-version accepts valid versions (e.g., 4)
  • βœ“ make validate-version rejects invalid versions (e.g., 999)
  • βœ“ make check-uv verifies uv installation

4. Requirements Processing

  • βœ“ make requirements downloads requirements file
  • βœ“ Requirements file is created and not empty
  • βœ“ Excluded packages are removed (psycopg2, psutil, dbus-python, etc.)
  • βœ“ Databricks version suffixes are cleaned from pyspark
  • βœ“ Binary-only packages are separated

5. Python Version Detection

  • βœ“ make python-version extracts correct Python version from docs

6. Full Environment Creation

  • βœ“ make env creates complete environment
  • βœ“ Virtual environment directory is created
  • βœ“ Python executable exists and is functional
  • βœ“ Lock file is created and contains packages
  • βœ“ Key packages are installed (pandas, numpy, pyspark)

7. Cleanup

  • βœ“ make clean removes files for specific version
  • βœ“ make clean-all removes all generated files

Test Output

The test suite provides colored output:

  • 🟑 YELLOW: Test being run
  • 🟒 GREEN: Test passed
  • πŸ”΄ RED: Test failed

Example output:

TEST: Testing 'make requirements' target (ENV_VER=4)
βœ“ PASS: requirements target executes successfully
βœ“ PASS: requirements-env-4.txt file created
βœ“ PASS: requirements-env-4.txt is not empty

========================================================================
TEST RESULTS SUMMARY
========================================================================
Total tests run:    25
Tests passed:       25
Tests failed:       0
========================================================================
All tests passed! βœ“

Test Options

Quick Tests (30 seconds) ⚑

./quick_test.sh

Validates basic functionality without creating full environment.

Full Test Suite (5-10 minutes) πŸ”

make test

Comprehensive tests including full environment creation and validation.

Manual Testing

Test individual targets:

make help
make list-versions
make validate-version ENV_VER=4
make requirements ENV_VER=4

Running Individual Tests

You can modify test_makefile.sh to run specific tests by commenting out sections you don't want to run.

Continuous Integration

GitHub Actions Workflows

This repository includes two automated workflows:

1. Test Makefile (.github/workflows/test.yml)

Runs on:

  • Push to main branch β†’ Quick tests only (~30 sec)
  • Pull requests to main β†’ Full test suite (~5-10 min)

What it does:

  • βœ… Installs uv
  • βœ… Runs quick tests (always)
  • βœ… Runs full test suite (PRs only)
  • βœ… Uploads logs on failure
  • βœ… Reports results in GitHub summary

2. Validate Pull Request (.github/workflows/validate-pr.yml)

Runs on:

  • Pull request opened/updated

What it does:

  • βœ… Quick validation (syntax, version detection, requirements)
  • βœ… Posts comment on PR with results
  • βœ… Fast feedback (~1 minute)

Manual CI/CD Integration

For other CI/CD systems:

# Example GitHub Actions
- name: Test Makefile
  run: |
    curl -LsSf https://astral.sh/uv/install.sh | sh
    export PATH="$HOME/.local/bin:$PATH"
    make test

# Example GitLab CI
test:
  script:
    - curl -LsSf https://astral.sh/uv/install.sh | sh
    - export PATH="$HOME/.local/bin:$PATH"
    - make test

Test Troubleshooting

Test Hangs

If the full environment creation test hangs, it will timeout after 10 minutes. Check /tmp/make_env_output.log for details.

PATH Issues

If tests fail with "uv is not installed", ensure uv is in your PATH:

export PATH="$HOME/.local/bin:$PATH"
make test

Cleanup Between Tests

The test script automatically runs make clean-all between major tests to ensure a clean state.

Adding New Tests

To add new tests to test_makefile.sh:

  1. Use print_test "Test description" to start a new test
  2. Run your test command
  3. Use pass "message" for successful assertions
  4. Use fail "message" for failed assertions

Example:

print_test "Testing custom target"
if make custom-target >/dev/null 2>&1; then
    pass "custom-target executed successfully"
else
    fail "custom-target failed"
fi

Test Performance

Full test suite typically takes:

  • Fast tests (validation, help, etc.): ~10 seconds
  • Full environment creation: 3-5 minutes
  • Total runtime: ~5-10 minutes

To skip the slow full environment test, comment out that section in test_makefile.sh.


πŸ› Troubleshooting

"uv is not installed"

Install uv and add it to your PATH:

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

"Could not extract Python version from documentation"

This usually means:

  1. Network connectivity issues
  2. Databricks documentation format changed
  3. Invalid environment version number

Try running make list-versions to see available versions.

"No such file or directory: .venv-dbX/bin/uv"

This error shouldn't occur with the current version. If you see it, ensure you're using the latest makefile.

Package Installation Failures

If specific packages fail to install:

  1. Check if it's a binary-only package that lacks ARM64 wheels
  2. Add it to EXCLUDED_PACKAGES or BINARY_ONLY_PACKAGES in the makefile
  3. The environment will still be created with other packages

Virtual Environment Activation Issues

Make sure you're using the correct command for your shell:

# bash/zsh
source .venv-db4/bin/activate

# fish
source .venv-db4/bin/activate.fish

# csh/tcsh
source .venv-db4/bin/activate.csh

πŸ—οΈ Project Structure

.
β”œβ”€β”€ makefile                 # Main automation script
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ test_makefile.sh        # Full test suite
β”œβ”€β”€ quick_test.sh           # Quick validation tests
β”œβ”€β”€ .venv-db{X}/            # Virtual environments (generated)
β”œβ”€β”€ requirements-env-{X}.txt        # Processed requirements (generated)
β”œβ”€β”€ requirements-env-{X}.txt.binary # Binary-only packages (generated)
└── requirements-env-{X}.lock       # Lock files (generated)

πŸ”„ Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     make env ENV_VER=4                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Check Prerequisites  β”‚
            β”‚   - Validate uv        β”‚
            β”‚   - Validate version   β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  Download Requirements β”‚
            β”‚  - Fetch from MS docs  β”‚
            β”‚  - Process packages    β”‚
            β”‚  - Exclude system pkgs β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Detect Python Ver    β”‚
            β”‚  - Parse from docs     β”‚
            β”‚  - Install with uv     β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Create Virtual Env   β”‚
            β”‚  - uv venv with ver    β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  Install Dependencies  β”‚
            β”‚  - Main packages       β”‚
            β”‚  - Binary-only (skip)  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Generate Lock File   β”‚
            β”‚  - uv pip freeze       β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
                   βœ… Done!

πŸ’‘ Tips & Best Practices

  1. Always specify the version: make env ENV_VER=4 is clearer than relying on defaults
  2. Check available versions first: Run make list-versions before creating an environment
  3. Test incrementally: Use individual targets (make requirements, make setup-venv) when debugging
  4. Keep environments separate: Use different ENV_VER values for different projects
  5. Use lock files: Commit requirements-env-X.lock to ensure reproducible environments
  6. Clean regularly: Run make clean-all to remove old environments and free up disk space

🀝 Contributing

To contribute improvements:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-improvement
  3. Make your changes to the makefile
  4. Run the test suite: make test (locally)
  5. Ensure all tests pass
  6. Commit your changes: git commit -m "Add my improvement"
  7. Push to your fork: git push origin feature/my-improvement
  8. Create a Pull Request
    • The Validate PR workflow will run automatically
    • Review the validation results posted as a comment
  9. Once approved and merged, the Test Makefile workflow runs on main

Setting Up Your Repository

After cloning/creating this repository:

  1. Update badge URLs in README.md:

    Replace YOUR_USERNAME/YOUR_REPO with your actual GitHub username and repository name
  2. Enable GitHub Actions:

    • Go to repository Settings β†’ Actions β†’ General
    • Ensure "Allow all actions and reusable workflows" is selected
  3. First Push:

    git add .
    git commit -m "Initial commit"
    git push origin main

    The workflows will run automatically!

πŸ“ License

This tool is provided as-is for use with Databricks environments.

πŸ”— Related Links

❓ FAQ

Q: Why use this instead of pip/conda?
A: This tool automatically matches Databricks environments exactly, handling version-specific quirks and platform differences.

Q: Can I use this on Linux/Windows?
A: It's designed for macOS but should work on Linux. Windows support via WSL2 is untested.

Q: What if a package I need was excluded?
A: You can manually install it in the venv, or modify EXCLUDED_PACKAGES in the makefile if you know how to handle dependencies.

Q: How do I update an existing environment?
A: Run make clean ENV_VER=X followed by make env ENV_VER=X to rebuild from scratch.

Q: Can I use this for multiple Databricks workspace versions?
A: Yes! Create separate environments: make env ENV_VER=1, make env ENV_VER=4, etc.


Made with ❀️ for Databricks developers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published