Skip to content

Latest commit

 

History

History
415 lines (249 loc) · 10.4 KB

File metadata and controls

415 lines (249 loc) · 10.4 KB

Contributing Guide

Welcome! There are many ways to contribute, including submitting bug reports, improving documentation, submitting feature requests, reviewing new submissions, or contributing code that can be incorporated into the project.

Review process

For any significant changes please create a new GitHub issue and enhancements that you wish to make. Describe the feature you would like to see, why you need it, and how it will work. Discuss your ideas transparently and get community feedback before proceeding.

Small changes can directly be crafted and submitted to the GitHub Repository as a Pull Request. This requires creating a repo fork using instruction.

Important notes

Please take into account that:

  • Some companies still use old Spark versions, like 3.2.0. So it is required to keep compatibility if possible, e.g. adding branches for different Spark versions.
  • Different users uses onETL in different ways - some uses only DB connectors, some only files. Connector-specific dependencies should be optional.
  • Instead of creating classes with a lot of different options, prefer splitting them into smaller classes, e.g. options class, context manager, etc, and using composition.

Initial setup for local development

Install Git

Please follow instruction.

Clone the repo

Open terminal and run these commands to clone a forked repo:

git clone git@github.com:myuser/onetl.git -b develop

cd onetl

Enable pre-commit hooks

Create virtualenv and install dependencies:

make venv-install

Install pre-commit hooks:

prek install --install-hooks

Test pre-commit hooks run:

prek run

How to

Run tests locally

Note

You can skip this if only documentation is changed.

Setup environment

Create virtualenv and install dependencies:

make venv-install

Using docker-compose

Build image for running tests:

docker-compose build

Start all containers with dependencies:

docker-compose --profile all up -d

You can run limited set of dependencies:

docker-compose --profile mongodb up -d

Run tests:

docker-compose run --rm onetl pytest

You can pass additional arguments, they will be passed to pytest:

docker-compose run --rm onetl pytest -m mongodb -lsx -vvvv --log-cli-level=INFO

You can run interactive bash session and use it:

docker-compose run --rm onetl bash

pytest -m mongodb -lsx -vvvv --log-cli-level=INFO

See logs of test container:

docker-compose logs -f onetl

Stop all containers and remove created volumes:

docker-compose --profile all down -v

Without docker-compose

Warning

To run HDFS tests locally you should add the following line to your /etc/hosts (file path depends on OS):

# HDFS server returns container hostname as connection address, causing error in DNS resolution
127.0.0.1 hdfs

Note

To run Oracle tests you need to install Oracle instantclient, and pass its path to ONETL_ORA_CLIENT_PATH and LD_LIBRARY_PATH environment variables, e.g. ONETL_ORA_CLIENT_PATH=/path/to/client64/lib.

It may also require to add the same path into LD_LIBRARY_PATH environment variable

Note

To run Greenplum tests, you should:

Start all containers with dependencies:

docker-compose --profile all up -d

You can run limited set of dependencies:

docker-compose --profile mongodb up -d

Run core tests:

make test-core

Run specific connection tests:

make test-spark PYTEST_ARGS="-m mongodb"
make test-no-spark PYTEST_ARGS="-m ftp"

You can pass additional arguments, they will be passed to pytest:

make test-spark PYTEST_ARGS="-m mongodb -lsx -vvvv --log-cli-level=INFO"

Stop all containers and remove created volumes:

docker-compose --profile all down -v

Build documentation

Note

You can skip this if only source code behavior remains the same.

Create virtualenv and install dependencies:

make venv-install

Build documentation using Sphinx:

cd docs
make html

Then open in browser docs/_build/index.html.

Create pull request

Commit your changes:

git commit -m "Commit message"
git push

Then open Github interface and create pull request. Please follow guide from PR body template.

After pull request is created, it get a corresponding number, e.g. 123 (pr_number).

Write release notes

onETL uses towncrier for changelog management.

To submit a change note about your PR, add a text file into the docs/changelog/next_release folder. It should contain an explanation of what applying this PR will change in the way end-users interact with the project. One sentence is usually enough but feel free to add as many details as you feel necessary for the users to understand what it means.

Use the past tense for the text in your fragment because, combined with others, it will be a part of the "news digest" telling the readers what changed in a specific version of the library since the previous version.

You should also use reStructuredText syntax for highlighting code (inline or block), linking parts of the docs or external sites. If you wish to sign your change, feel free to add -- by :user:`github-username` at the end (replace github-username with your own!).

Finally, name your file following the convention that Towncrier understands: it should start with the number of an issue or a PR followed by a dot, then add a patch type, like feature, doc, misc etc., and add .rst as a suffix. If you need to add more than one fragment, you may add an optional sequence number (delimited with another period) between the type and the suffix.

In general the name will follow <pr_number>.<category>.rst pattern, where the categories are:

  • feature: Any new feature
  • bugfix: A bug fix
  • improvement: An improvement
  • doc: A change to the documentation
  • dependency: Dependency-related changes
  • misc: Changes internal to the repo like CI, test and build changes

A pull request may have more than one of these components, for example a code change may introduce a new feature that deprecates an old feature, in which case two fragments should be added. It is not necessary to make a separate documentation fragment for documentation changes accompanying the relevant code changes.

Examples for adding changelog entries to your Pull Requests

Added a ``:github:user:`` role to Sphinx config -- by :github:user:`someuser`
Fixed behavior of ``WebDAV`` connector -- by :github:user:`someuser`
Added support of ``timeout`` in ``S3`` connector
-- by :github:user:`someuser`, :github:user:`anotheruser` and :github:user:`otheruser`

Tip

See pyproject.toml for all available categories (tool.towncrier.type).

How to skip change notes check?

Just add ci:skip-changelog label to pull request.

Release Process

Note

This is for repo maintainers only

Before making a release from the develop branch, follow these steps:

  1. Checkout to develop branch and update it to the actual state
git checkout develop
git pull -p
  1. Backup NEXT_RELEASE.rst
cp "docs/changelog/NEXT_RELEASE.rst" "docs/changelog/temp_NEXT_RELEASE.rst"
  1. Build the Release notes with Towncrier
VERSION=$(cat onetl/VERSION)
towncrier build "--version=${VERSION}" --yes
  1. Change file with changelog to release version number
mv docs/changelog/NEXT_RELEASE.rst "docs/changelog/${VERSION}.rst"
  1. Remove content above the version number heading in the ${VERSION}.rst file
awk '!/^.*towncrier release notes start/' "docs/changelog/${VERSION}.rst" > temp && mv temp "docs/changelog/${VERSION}.rst"
  1. Update Changelog Index
awk -v version=${VERSION} '/DRAFT/{print;print "    " version;next}1' docs/changelog/index.rst > temp && mv temp docs/changelog/index.rst
  1. Restore NEXT_RELEASE.rst file from backup
mv "docs/changelog/temp_NEXT_RELEASE.rst" "docs/changelog/NEXT_RELEASE.rst"
  1. Commit and push changes to develop branch
git add .
git commit -m "Prepare for release ${VERSION}"
git push
  1. Merge develop branch to master, WITHOUT squashing
git checkout master
git pull
git merge develop
git push
  1. Add git tag to the latest commit in master branch
git tag "$VERSION"
git push origin "$VERSION"
  1. Update version in develop branch after release:
git checkout develop

NEXT_VERSION=$(echo "$VERSION" | awk -F. '/[0-9]+\./{$NF++;print}' OFS=.)
echo "$NEXT_VERSION" > onetl/VERSION

git add .
git commit -m "Bump version"
git push