Skip to content

contributing

deena-b edited this page Aug 14, 2019 · 1 revision

How to contribute

Introduction

The DCL project is a mixture of data science, script writing, and (hopefully eventually) software package development.

We aim to keep the barriers to becoming a contributor low, which means that we aim to have a tutorial for EVERYTHING - even for things that some people might say "google it". In some cases, our tutorials will provide links to webpages that describe what to do or give more details.

If you see an area where a tutorial doesn't exist or could be improved, please add an issue to the overview repo. Your issue should state what you were trying to do, where you got stuck, and what you plan to add.

Join us on Github

Outline

There are a number of steps to contributing using git (version control software) and github (cloud based storage space). The major steps are:

  1. Set up a remote github account (in the cloud)
  2. Fork a repository (repo for short) from the remote DCL account, e.g. mitolin, to your remote origin account
    1. Pictures of the fork button can be found on GitHub Help pages
  3. Using a terminal on your 'local' computer, navigate to a directory where you want to keep your new DCL repo
  4. Use the command line to clone your forked repo from your remote origin account to your local machine, e.g. git clone https://github.com/deena-b/mitolin.git (replace 'deena-b' with your own github username)
  5. Connect your local repo to the upstream remote repo
    1. git remote add upstream https://github.com/deepcelllineage/mitolin.git
    2. If you mistakenly cloned the repo from upstream, rename the remote from origin to upstream and connect your local repo to your remote origin
      1. git remote rename origin upstream
      2. git remote add origin https://github.com/deena-b/mitolin.git
  6. View all branches git branch -a
    1. Note the presence of the following branches
remotes/upstream/master
remotes/upstream/dev
  1. If you don't see a dev branch, make one, move onto it and tell it to track the upstream version, all with the single command git checkout --track upstream/dev
  2. If a dev branch already exists, move onto it git checkout dev
  3. Determine your current status git status
    1. You should see the following
      1. You are on branch dev
      2. Your branch is up to date with 'upstream/dev'
        1. If you do not see the above line, set your branch to track upstream/dev with: git branch -u upstream/dev
      3. Nothing to commit, working tree clean
  4. Create and move onto your very own local feature branch git checkout -b feature_name
    1. Take a minute to think of a good name for your feature branch (naming things in programming is notoriously hard, but don't worry, you will get better with practice and that's what the DCL project is all about)
      1. start your branch name with a short word that helps you remember what you plan to work on in this branch, e.g. "distcalc" for the distance calculator tutorial issue
      2. Next use CAPS to write you initials (use 3 initials!!!) e.g. mine are "DRB"
      3. Your feature should always relate to an issue. If a relevant issues doesn't exist, then submit one! At the end of your branch name write the issue number "i#".
      4. A full branch name looks like this: "distcalcDRBi12"
  5. Make files
  6. Make a file in the nb directory
    1. This can be a .md or .ipynb
      1. touch distcalctut.md
    2. In the first line write what you aim to accomplish for your new 'feature'. e.g.:
      1. "The aim of this feature is to create a tutorial that breaks down BioPython's distance calculations for nucleotides"
  7. Make any other files or folders that you need, eg a .sh file in the src/ directory or a dated folder for generated data in the data/gen/nguyen_nc_2018/ directory

  1. When to add (stage), commit, and when to make a pull request to merge your branch with the dev branch???

    1. Why stage?
      1. Staging allows you to customize what goes into a commit. For example, if you make three changes and only two relate to each other, you can stage & commit 2, then stage and commit the other change separately
    2. When to commit??
      1. After you made someting work
      2. After you made a meaningful change
      3. Mantra: Commit Often, Perfect Later
    3. How to use git diff
      1. git diff can compare commits, branches, files and more
      2. git diff compares changes since last commit
      3. git diff branch1 branch2
        1. a space means compare the tips of each branch
          1. instead of a space you could use two dots between the branches
          2. 3 dots changes branch1 into a ref of the shared common ancestor commit between the two diff inputs
      4. git diff branch1 branch2 fileA
        1. just shows differences of fileA between the two branches
    4. git squash
      1. before a PR
    5. git push
      1. to origin often
      2. to remote - when you have finished with a feature and are ready to merge it to the dev branch
  2. Push your work to your remote origin whenever you get interupted, regardless of whether it is the end of the day or you need to work on another DCL issue

    1. git status
    2. git add filename
    3. git commit -m "describe what you changed in the file"
    4. (repeat git add ... & git commit ... for each file or just use git commit -a -m "summarize what you did")
    5. You could add (aka stage) and commit changes as you go if you want to keep track of your changes in smaller steps
    6. git push origin feature_branch_name
  3. Return to work

  4. git checkout dev

  5. git fetch upstream

  6. git status

  7. If there are differences, git rebase upstream/dev

    1. You should be able to see a log of diffs somewhere... git show
  8. git checkout your_feature

  9. Pull any updates into your feature branch, so you are working with the most up-to-date files git rebase upstream/dev

  10. If your uncommitted changes clash with the differences you have two choices

    1. git stash (and then what if you can't git stash pop????) or
    2. Commit the changes and manually go through the diffs to choose which to keep
  11. Done with your feature? Rebase it to the dev branch

    1. git checkout dev
    2. git rebase feature_branch
  12. When do we get to rebase dev to master?

    1. When it the group moves from one biggish issue to another. For example, when we feel we fully understand and have created a cohesive set of notebooks that explain distance calculations with BioPython
  13. If you are interested in contributing, email Deena (deenab7 at gmail dot com) with your github username. Deena will add you as a collaborator with write permissions to the overview repo and a collaborator with Triage permissions to any other repo you request. Write permissions allow you to push directly to all branches of the overview and overview/wiki repos. Traige permissions allow you to accept PRs.

  14. Note that github wiki pages are weird! You should clone the overview/wiki from the remote upstream (i.e. git clone https://github.com/deepcelllineage/overview.wiki.git) and push directly there. When I tried to connect my origin overview.wiki it deleted all my upstream history!! If you have a better understanding of how this works, please make an issue in the upstream overview repo to tell us what you know.

Further reading about git

Zvonimir Spajic's 3 part Hackernoon blog on Data Storage, Branching, Indexing

Indexing/Staging

Git Diff

Git Workflow

Best Practices

1. Join Github

If you don't already have a github account, set one up here.

Fork and clone a repo

The first steps in becoming a contributor (or a user) are to fork and clone a repo.

To open the file where this information is stored (.git/config) type git config --edit.

The DCL group will aim to use the commonly used git workflow imaged below with some modifications related to the nature of our project.

Clone this wiki locally