Welcome! There are many ways to contribute, including submitting bug reports, improving documentation, submitting feature requests, reviewing new submissions, or contributing code that can be incorporated into the project.
For any significant changes please create a new GitHub issue and enhancements that you wish to make. Describe the feature you would like to see, why you need it, and how it will work. Discuss your ideas transparently and get community feedback before proceeding.
Small changes can directly be crafted and submitted to the GitHub Repository as a Pull Request. This requires creating a repo fork using instruction.
Please take into account that:
- Some companies still use old Spark versions, like 3.2.0. So it is required to keep compatibility if possible, e.g. adding branches for different Spark versions.
- Different users uses onETL in different ways - some uses only DB connectors, some only files. Connector-specific dependencies should be optional.
- Instead of creating classes with a lot of different options, prefer splitting them into smaller classes, e.g. options class, context manager, etc, and using composition.
Please follow instruction.
Open terminal and run these commands to clone a forked repo:
git clone git@github.com:myuser/onetl.git -b develop
cd onetlCreate virtualenv and install dependencies:
make venv-installInstall pre-commit hooks:
prek install --install-hooksTest pre-commit hooks run:
prek runNote
You can skip this if only documentation is changed.
Create virtualenv and install dependencies:
make venv-installBuild image for running tests:
docker-compose buildStart all containers with dependencies:
docker-compose --profile all up -dYou can run limited set of dependencies:
docker-compose --profile mongodb up -dRun tests:
docker-compose run --rm onetl pytestYou can pass additional arguments, they will be passed to pytest:
docker-compose run --rm onetl pytest -m mongodb -lsx -vvvv --log-cli-level=INFOYou can run interactive bash session and use it:
docker-compose run --rm onetl bash
pytest -m mongodb -lsx -vvvv --log-cli-level=INFOSee logs of test container:
docker-compose logs -f onetlStop all containers and remove created volumes:
docker-compose --profile all down -vWarning
To run HDFS tests locally you should add the following line to your /etc/hosts (file path depends on OS):
# HDFS server returns container hostname as connection address, causing error in DNS resolution
127.0.0.1 hdfs
Note
To run Oracle tests you need to install Oracle instantclient,
and pass its path to ONETL_ORA_CLIENT_PATH and LD_LIBRARY_PATH environment variables,
e.g. ONETL_ORA_CLIENT_PATH=/path/to/client64/lib.
It may also require to add the same path into LD_LIBRARY_PATH environment variable
Note
To run Greenplum tests, you should:
- Download VMware Greenplum connector for Spark
- Either move it to
~/.ivy2/jars/, or pass file path toCLASSPATH - Set environment variable
ONETL_GP_PACKAGE_VERSION=local.
Start all containers with dependencies:
docker-compose --profile all up -dYou can run limited set of dependencies:
docker-compose --profile mongodb up -dRun core tests:
make test-coreRun specific connection tests:
make test-spark PYTEST_ARGS="-m mongodb"
make test-no-spark PYTEST_ARGS="-m ftp"You can pass additional arguments, they will be passed to pytest:
make test-spark PYTEST_ARGS="-m mongodb -lsx -vvvv --log-cli-level=INFO"Stop all containers and remove created volumes:
docker-compose --profile all down -vNote
You can skip this if only source code behavior remains the same.
Create virtualenv and install dependencies:
make venv-installBuild documentation using Sphinx:
cd docs
make htmlThen open in browser docs/_build/index.html.
Commit your changes:
git commit -m "Commit message"
git pushThen open Github interface and create pull request. Please follow guide from PR body template.
After pull request is created, it get a corresponding number, e.g. 123 (pr_number).
onETL uses towncrier
for changelog management.
To submit a change note about your PR, add a text file into the docs/changelog/next_release folder. It should contain an explanation of what applying this PR will change in the way end-users interact with the project. One sentence is usually enough but feel free to add as many details as you feel necessary for the users to understand what it means.
Use the past tense for the text in your fragment because, combined with others, it will be a part of the "news digest" telling the readers what changed in a specific version of the library since the previous version.
You should also use
reStructuredText syntax for highlighting code (inline or block),
linking parts of the docs or external sites.
If you wish to sign your change, feel free to add -- by
:user:`github-username` at the end (replace github-username
with your own!).
Finally, name your file following the convention that Towncrier
understands: it should start with the number of an issue or a
PR followed by a dot, then add a patch type, like feature,
doc, misc etc., and add .rst as a suffix. If you
need to add more than one fragment, you may add an optional
sequence number (delimited with another period) between the type
and the suffix.
In general the name will follow <pr_number>.<category>.rst pattern,
where the categories are:
feature: Any new featurebugfix: A bug fiximprovement: An improvementdoc: A change to the documentationdependency: Dependency-related changesmisc: Changes internal to the repo like CI, test and build changes
A pull request may have more than one of these components, for example a code change may introduce a new feature that deprecates an old feature, in which case two fragments should be added. It is not necessary to make a separate documentation fragment for documentation changes accompanying the relevant code changes.
Added a ``:github:user:`` role to Sphinx config -- by :github:user:`someuser`Fixed behavior of ``WebDAV`` connector -- by :github:user:`someuser`Added support of ``timeout`` in ``S3`` connector
-- by :github:user:`someuser`, :github:user:`anotheruser` and :github:user:`otheruser`Tip
See pyproject.toml for all available categories
(tool.towncrier.type).
Just add ci:skip-changelog label to pull request.
Note
This is for repo maintainers only
Before making a release from the develop branch, follow these steps:
- Checkout to
developbranch and update it to the actual state
git checkout develop
git pull -p- Backup
NEXT_RELEASE.rst
cp "docs/changelog/NEXT_RELEASE.rst" "docs/changelog/temp_NEXT_RELEASE.rst"- Build the Release notes with Towncrier
VERSION=$(cat onetl/VERSION)
towncrier build "--version=${VERSION}" --yes- Change file with changelog to release version number
mv docs/changelog/NEXT_RELEASE.rst "docs/changelog/${VERSION}.rst"- Remove content above the version number heading in the
${VERSION}.rstfile
awk '!/^.*towncrier release notes start/' "docs/changelog/${VERSION}.rst" > temp && mv temp "docs/changelog/${VERSION}.rst"- Update Changelog Index
awk -v version=${VERSION} '/DRAFT/{print;print " " version;next}1' docs/changelog/index.rst > temp && mv temp docs/changelog/index.rst- Restore
NEXT_RELEASE.rstfile from backup
mv "docs/changelog/temp_NEXT_RELEASE.rst" "docs/changelog/NEXT_RELEASE.rst"- Commit and push changes to
developbranch
git add .
git commit -m "Prepare for release ${VERSION}"
git push- Merge
developbranch tomaster, WITHOUT squashing
git checkout master
git pull
git merge develop
git push- Add git tag to the latest commit in
masterbranch
git tag "$VERSION"
git push origin "$VERSION"- Update version in
developbranch after release:
git checkout develop
NEXT_VERSION=$(echo "$VERSION" | awk -F. '/[0-9]+\./{$NF++;print}' OFS=.)
echo "$NEXT_VERSION" > onetl/VERSION
git add .
git commit -m "Bump version"
git push