This is the initial branch of the main pipeline for data analysis at the Hussaini Lab. It will eventually be forked to Hussaini Lab's GitHub account when it's ready to be used. This repository is designed to be collaborateively developed according to the guidelines laid out below.
There are four main software modules called core, library, scripts and widgets. There are also several additional bridge / firmware modules that are all named by the convention x_{} (e.g. x_io, x_cli, etc.).
In the io layer, there are submodules for reading and writing different data formats, and tools for getting data from users. Additionally, there are two modules for creating the data classes Session and Study. These classes serve as a bridge and formatting bottleneck between the io layer and the core data types. They store data in a in a standardized format that is easy to convert into the core data types. This format is isomorphic to dictionaries with keys and contents as outlined below (i.e. every io module must generate a subset of the following keys and contents).
- animal
- animal_id
- species
- sex
- age
- weight
- genotype
- animal_notes
- devices
- implants
- implant_id
- implant_type
- tetrode
- tetrode_array_{int}
- sillicone_shank
- ...
- implant_geometry
- channel_{int}: (x, y, z)
- ...
- implant_data
- sample_rate OR irregular_sampling: True
- event_times: [float]
- event_labels: [string]
- channel_{int}: [float]
- units: string
- axona_led_tracker
- led_tracker_id
- led_location
- led_position_data: {
time:float,x:float,y:float}
- implants
The core module contains the canonical data types and functions for managing them. This module has no dependencies other than the version of python being used and numpy. Some of the data structures are inspired by the Neurodata Without Borders project, though the code is not directly based on that project.
The library module contains data types, classes and functions for exploring, analyzing and visualizing the core data types. Each of these functions does only one thing.
The scripts module contains use cases where settings are configured up front and then several functions are called in sequence to perform the desired task. There are three main types of scripts. (1) Batch processing scripts that run on a large number of files. (2) Parameter exploration / optimization scripts that run on a single file. (3) Automations for smaller tasks that require multiple library functions, are repeated often, and should be performed the same way everywhere. The third type of script may be called by the first two.
The widgets module contains framework-agnostic widgets for interactive analysis. Each widget is constituted of two parts. The first part is a View class that contains all the states of the widget, as well as any text that would be displayed to the user. The second part is a Controller class that contains all the logic for the widget. The Controller class is responsible for updating the View class when the user interacts with the widget and for calling the appropriate functions in the library or scripts module. Each widget is structured as a microservice, so there are no dependencies between the widgets. Any information that is needed by a widget is passed as a parameter to the Controller class. Communication between two widgets (if ever necessary) would be done through separate bridge submodules in the widgets module.
The x_{} modules contain interfaces for frameworks and APIs. Moreover, they contain adapters, gateways and other glue code for the core and library modules.
The x_cli module contains the command line interfaces for running scripts and launching widgets.
The x_gui module bridges GUI framework(s) for the widgets.
The x_io module contains the functions for reading and writing data from various formats in various frameworks (e.g. databases, os file systems, server APIs, etc.).
If you would like to contribute to the project, please read all the guidelines first.
- Generally follow the PEP 8 style conventions.
- Additionally, please name functions and classes in such a way that there are no names that are substrings of other names. For example,
get_waveformandget_waveform_from_fileare not allowed. Instead, use something likeget_waveformandwaveform_from_file_path. This is to allow global search and replace when changing the name of a function or class. - When importing modules:
- Group all the internal dependencies together at the top under the header "# Internal Dependencies".
- Group all the external dependencies together below header the internal dependencies.
- Divide them into subgroups under the headers "# A+ Grade External Dependencies," "# A Grade External Dependencies" and "# Other External Dependencies".
- See the External Dependencies section for more details on the lists.
- TDD is required for contributors.
- Use
pytestfor unit tests. - Store test files for a given module (top folder level) in a folder of the form
{module}_tests. - If there are folder level submodules, create a subfolder named with the pattern
{submodule}_tests. - For each submodule in the module, create a test file and name it
test_{submodule}.py. This convention is very important because it allowspytestto find the tests. - Write your test(s) before writing your new function or class.
- A function or class should either have an automated test or contain only code that requires manual testing.
- e.g.
ViewandControllerclasses in thewidgetsmodule are segregated from theWindowclasses in thex_guimodule; - the
Windowclasses do exactly and only two things---display the state of aViewobject and feed user input to theController.
- e.g.
- Dependencies between layers flow only one way:
core<library<scripts<widgets<x_{}.- e.g.
librarycalls onlycore,scriptscan directly calllibraryorcore, ect.
- e.g.
- Limit interdependency among modules in the same layer. Segregate large scripts or widgets into microservices that are independent of each other (no common databases or global states).
- For function modules, create roughly one public function per module. Name helper functions using an underscore as the first character (e.g.
_helper_function). - For class modules, use only a handfull of abstract classes per module (preferably no more than one).
coreshould not depend on any external libraries except numpy.libraryshould depend only oncoreand a few A+ safe libraries (see below).scriptsandwidgetsshould depend onlibrary,coreand A safe libraries.- The
x_{}modules can have dependencies on any framework or firmware that is reasonably well maintained.- Whenever possible, a submodule in a given
x_{}module (e.g.x_gui.pyside6) should depend only on the framework listed. - Exceptions exist. For example, the
x_climodule contains submodules that launch widgets using dependencies onx_guisubmodules. When doing this, try to keep the dependencies as minimal as possible.
- Whenever possible, a submodule in a given
- numpy
- pandas
- matplotlib
- scipy
- statsmodels
- plotly
- PIL / pillow
- Qt / PySide6
- Axona / TINT file formats (.set, .bin, .X, .cut, .pos, .eeg, .egf)
- .X file extensions denote files where there is an integer after the dot (e.g. .1, .2, .10, etc.)
- Intan file formats (.rhd, .rdh?)
If you'd like to propose adding a library to the A Safe or A+ Safe list, please contact the current maintainer Oliver Shetler at cos2112@cumc.columbia.edu.
- atomic commits---commit every change to every function whenever the relevant test is passing
- only commit the relevant module (and test)
- frequent commits---DO NOT build an entire feature and then commit. This can lead to merge conflicts
- frequent pulls---pull every time before committing and notify partner if there is a merge conflict (tell them to pause on that module)
- separate hard to test parts of a class or function from the easy parts
core is the most fundamental module. It should not depend on any other module. It should only contain classes and functions that pertain to basic data types that could conceivably have been read from a file.
When defining new class, please consider the following:
- Metadata: Are the data in the class metadata for an object that was used in the experiment?
- If yes, then the class should be in
core. - If no, then the class might belong in
library.
- If yes, then the class should be in
- Unit Homogenaity: Does the class contain an attriute that contains one principle data structure? Are all the data within the principle attribute of the same unit type?
- If yes to both, then the class should be in
core. - If no, then the class might belong in
library.
- If yes to both, then the class should be in
When defining a new function, please consider the following:
- Does the function operate on a single data type with homogeneous units?
- If yes, then the function should be in
core. It might belong inside the pertainant class. - If no, then the function might belong in
library.
- If yes, then the function should be in
library is the module that contains the bulk of the code. It should depend only on core and a few A and A+ safe libraries (see above). It contains the data structures and functions that are used to analyze, visualize and derive new data types from the data objects in the core module.
When defining new classes, please consider the following:
- Does the class combine data from multiple core classes?
- If yes, then the class should be in library.
- If no, then the class might belong in core.
- Does the class or function perform a non-invertable transformation on a single data type and/or does the operation change the units? (e.g. converting locations to speeds or velocities)
- If yes, then the class should be in library.
- If no, then the class might belong in core.
- Does the class store a single object for a single task?
- If yes, then the class should be in library.
- If no, then the class might belong in scripts or widgets.
- Does the class contain only methods that do not cause side effects? (e.g. no direct manipulation of data; only creation and deletion of data attributes; e.g. no read/write or display methods)
- If yes, then the class should be in library.
- If no, then the class might belong in scripts or widgets.
When defining new functions, please consider the following:
- Does the function perform a single analysis, reformatting or visualization task (without displaying)?
- If yes, then the function should be in library.
- If no, then the function might belong in scripts or widgets.
Architecture rules:
- Horizontal Segregation: Scripts should not depend on each other.
- Vertical Segregation: Scripts should not depend on
widgetsor anyx_{}modules. - No Global States: Scripts should not have any global states.
- No Interfaces: Scripts should not contain any adapters or interfaces.
- One Configuration Dictionary: Scripts should always take in exactly ONE configuration dictionary.
- Data Structures: Aside from one configuration dictionary, scripts should only take in data structures from the
coreandlibrarymodules.
When defining new classes or functios in the scripts module, please consider the following:
- Does the class or function perform a sequence of steps that automate a use case?
- If yes, then the class or function should be in scripts.
- If no, then the class or function might belong in library or widgets.
- Does the class or function require user input only at the beginning of the use case?
- If yes, then the class or function should be in scripts.
- If no, then the class or function might belong in widgets.
When defining new classes in the widgets module, please consider the following:
- Does the class store part of an abstract user interface (e.g. a View or Controller)?
- If yes, then the class should be in widgets.
- If no, then the class might belong in scripts or library.
- Does the class contain