Build robust Python programming skills with a focus on best practices, error handling, and scientific computing fundamentals.
This lesson teaches novice programmers to write modular code to perform data analysis using Python. The emphasis, however, is on teaching language-agnostic principles of programming such as automation with loops and encapsulation with functions, see Best Practices for Scientific Computing and Good enough practices in scientific computing to learn more.
The example used in this lesson analyses a set of 12 files with simulated inflammation data collected from a trial for a new treatment for arthritis. Learners are shown how it is better to automate analysis using functions instead of repeating analysis steps manually.
This workshop focuses on building strong programming foundations through hands-on practice with data analysis.
This lesson is also available in R and MATLAB.
| # | Episode | Time | Question(s) |
|---|---|---|---|
| 1 | Python Fundamentals | 30 | What basic data types can I work with in Python? How can I create a new variable in Python? Can I change the value associated with a variable after I create it? |
| 2 | Analyzing Patient Data | 60 | How can I process tabular data files in Python? |
| 3 | Visualizing Tabular Data | 50 | How can I visualize tabular data in Python? How can I group several plots together? |
| 4 | Storing Multiple Values in Lists | 30 | How can I store many values together? |
| 5 | Repeating Actions with Loops | 30 | How can I do the same operations on many different values? |
| 6 | Analyzing Data from Multiple Files | 20 | How can I do the same operations on many different files? |
| 7 | Making Choices | 30 | How can my programs do different things based on data values? |
| 8 | Creating Functions | 30 | How can I define new functions? What's the difference between defining and calling a function? What happens when I call a function? |
| 9 | Errors and Exceptions | 30 | How does Python report errors? How can I handle errors in Python programs? |
| 10 | Defensive Programming | 30 | How can I make my programs more reliable? |
| 11 | Debugging | 30 | How can I debug my program? |
| 12 | Command-Line Programs | 30 | How can I write Python programs that will work like Unix command-line tools? |
The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis.
Our imaginary colleague "Dr. Maverick" has invented a new miracle drug that promises to cure arthritis inflammation flare-ups after only 3 weeks since initially taking the medication! Naturally, we wish to see the clinical trial data, and after months of asking for the data they have finally provided us with a CSV spreadsheet containing the clinical trial data.
The CSV file contains the number of inflammation flare-ups per day for the 60 patients in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a patient, and each column corresponds to a day in the trial. Once a patient has their first inflammation flare-up they take the medication and wait a few weeks for it to take effect and reduce flare-ups.
To see how effective the treatment is we would like to:
- Calculate the average inflammation per day across all patients.
- Plot the result to discuss and share with colleagues.
{alt='3-step flowchart shows inflammation data records for patients moving to the Analysis stepwhere a heat map of provided data is generated moving to the Conclusion step that asks thequestion, How does the medication affect patients?'}
The data sets are stored in comma-separated values (CSV) format:
- each row holds information for a single patient,
- columns represent successive days.
The first three rows of our first file look like this:
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
Each number represents the number of inflammation bouts that a particular patient experienced on a given day.
For example, value "6" at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study.
In order to analyze this data and report to our colleagues, we'll have to learn a little bit about programming.
:::::::::::::::::::::::::::::::::::::::::: prereq
You need to understand the concepts of files and directories and how to start a Python interpreter before tackling this lesson. This lesson sometimes references Jupyter Notebook although you can use any Python interpreter mentioned in the Setup.
The commands in this lesson pertain to any officially supported Python version. Newer versions usually have better error printouts, so using newer Python versions is recommend if possible.
To get started, follow the directions on the Setup page to download data and install a Python interpreter.
Instructional material from this lesson is made available under the Creative Commons Attribution (CC BY 4.0) license. Except where otherwise noted, example programs and software included as part of this lesson are made available under the MIT license. For more information, see LICENSE.md.
Please cite as:
Azalee Bostroem, Trevor Bekolay, and Valentina Staneva (eds): "Software Carpentry: Programming with Python." Version 2016.06, June 2016, https://github.com/swcarpentry/python-programming-foundations, 10.5281/zenodo.57492.
Software Carpentry is a volunteer project that teaches basic computing skills to researchers since 1998.
The Carpentries is a registered 501(c)3 non-profit organisation based in Delaware, USA. We are a global community teaching foundational computational and data science skills to researchers in academia, industry and government..