T3 chomper

Parse data from Pion Sirius T3 instrument XML files

WARNING HIGHLY WIP

Instructions

Installation

First clone and install the repo

git clone git@github.com:OpenADMET/t3-chomper.git
cd t3-chomper
pip install -e .

Or download and install a whl file from a release

pip install t3_chomper-0.1.5-py3-none-any.whl

Parsing data files

One can then parse file(s) from the T3 instrument using the CLI. One can parse from one file or a directory of t3r result files. The user must specify whether you are parsing pka data or logp data with the --protocol argument. Parsed results will be written to a csv file.

# Extract pka data from a file named my_data.t3r to pka_output.csv
t3_extract my_data.t3r --protocol pka --output pka_output.csv

# Extract logp data from a directory /home/data/logp_files to logp_output.csv
t3_extract /home/data/logp_files/ --protocol logp --output logp_output.csv

# Extract pka data from a file and write to stdout
t3_extract my_data.t3r --protocol pka

Generating experiment imports

One can also generate CSV import files for creating experiments. For this, one needs a registration file with sample information and a pKa data file with estimated pKas. There are several defined experimental tray layouts which can be listed by viewing the help dialog:

t3_gencsv --help

The registration file should have the following columns (column names are case-insensitive):

Column	Description
sample	compound/sample ID (must match pKa data file)
well	well in plate
mw	molecular weight
fw	formula weight
mg	mass in mg

The pKa data file can be in one of two formats:

Short pKa data file format:

The short format includes one row for each compound and one column for all pKas per compound. It uses the same comma-separated format that the T3 instrument uses.

Column	Description
sample	compound/sample ID (must match registration file)
reformatted_pkas	comma-separated list of pKas and types in ascending order: e.g. "ACID,2.5,BASE,9.3"

Long pKa data file format:

The long format includes one row for each pKa. Compounds with multiple pKas will appear in multiple rows.

Column	Description
sample	compound/sample ID (must match registration file)
pka_value	value of pka
pka_type	type of pka ("acid" or "base")

Filtering/Limiting Entries

One can limit the entries that are used in the experiment by filtering samples in the regi file by including a filter file with the --filter-file argument. The filter file just needs one column sample that matches samples in the regi file. Only samples found in the filter file will progress to subsequent steps for generating experiment files.

Available experiment layouts

TrayFormat	Description
fastuvpska	47 samples per tray, with one calibration
phmetric	24 samples per tray, with 1x cleanup step after each
uvmetric	24 samples per tray, calibration before each sample
logp	16 samples per tray, with 2x cleanup steps after each

Files for each tray will be generated in the output directory provided with the --output argument.

Examples

# Generate csv experiment import files with fastuvpska format
t3_gencsv --regi <registration file> --pka <pKa data file> --protocol fastuvpska --output <new_pka_experiment_dir>

# Generate csv experiment import files with logp format (solvent is required)
t3_gencsv --regi <registration file> --pka <pKa data file> --protocol logp --logp-solvent octanol --output <new_logp_experiment_dir>

# Generate csv experiment import files with phmetric tray format and only include samples listed in the filter file
t3_gencsv --regi <registration file> --filter-file <filter_file> --pka <pKa data file> --protocol phmetric --output <new_logp_experiment_dir>

# Generate csv with custom concentration (20 mM) and volume (10 µL)
t3_gencsv --regi <registration file> --pka <pKa data file> --protocol fastuvpska --concentration 20 --volume 10 --output <output_dir>

# Generate logP experiment with toluene solvent
t3_gencsv --regi <registration file> --pka <pKa data file> --protocol logp --logp-solvent toluene --output <output_dir>

Optional Parameters

--concentration FLOAT: Sample concentration in mM (default: 10.0)
--volume FLOAT: Sample volume in µL (default: 5.0)
--logp-solvent [octanol|toluene|cyclohexane|chloroform]: Solvent for logP protocol (required when --protocol is logp)
--sample-col TEXT: Name of sample/ID column for joining files (default: "sample")
--filter-file FILE: CSV file with sample names to include (filters the registration file)

Notes:

--concentration and --volume are not used for logP protocol
--logp-solvent is required when --protocol is logp - the command will fail with a validation error if not provided


## PowerShell Scripts for Windows Users

The `powershell_scripts/` folder contains Windows PowerShell scripts that provide GUI wrappers and utilities for working with T3 instrument data. These scripts are designed to simplify common workflows on Windows systems.

### Available Scripts

#### `create_experiment_folder.ps1`
Creates a standardized experiment folder structure for organizing T3 data.

**What it does:**
- Prompts for an Experiment ID (format: ST3-<id>)
- Creates a base experiment folder at `C:\New PION Data\<Experiment ID>`
- Creates standard subfolders:
  - `Unprocessed` - Raw, unprocessed T3R files
  - `QC Fail` - Files that failed quality control
  - `For logP` - Files prepared for logP analysis
  - `For UV-metric pKa` - Files for UV-metric pKa experiments
  - `For pH-metric pKa` - Files for pH-metric pKa experiments
  - `Input files` - Registration and pKa data files

**Usage:**
```powershell
.\powershell_scripts\create_experiment_folder.ps1

`parse_t3r_filenames.ps1`

Parses T3R filenames following a standard naming convention and exports metadata to CSV.

What it does:

Opens folder browser to select a directory containing .t3r files
Parses filenames matching pattern: DATE_SAMPLE_ID_ASSAY_NAME.t3r
Extracts three fields: date, sample_id, assay_name
Exports parsed data to parsed_files_<timestamp>.csv
Reports skipped files that don't match the expected pattern

Usage:

.\powershell_scripts\parse_t3r_filenames.ps1

`run_extract.ps1`

GUI wrapper for the t3_extract command-line tool.

What it does:

Opens folder browser to select data location
Presents dropdown to choose protocol (pka or logp)
Executes t3_extract.exe with selected parameters
Captures stdout, stderr, and exit code to timestamped log file
Shows success/error dialog with option to view log

Equivalent command line:

t3_extract --protocol <selected_protocol> --output <protocol>_output.csv "<selected_folder>"

Usage:

.\powershell_scripts\run_extract.ps1

`run_gencsv.ps1`

GUI wrapper for the t3_gencsv command-line tool.

What it does:

Opens file browser to select registration CSV file (--regi)
Opens file browser to select pKa data CSV file (--pka)
Optionally prompts for filter file to limit samples (--filter-file)
Presents dialog to choose protocol (fastuvpska, phmetric, uvmetric, logp)
For non-logP protocols, allows customization of:
- Concentration (mM) - default: 10
- Volume (µL) - default: 5
- These fields are disabled for logP protocol
For logP protocol, allows selection of solvent:
- Choices: , octanol, toluene, cyclohexane, chloroform
- Validation: Solvent selection is required for logP protocol
- Dialog prevents closing until valid solvent is selected or user cancels
- This field is disabled for non-logP protocols
Automatically sets output folder to <regi_folder>/<protocol>
Executes t3_gencsv.exe with selected parameters
Captures output and creates timestamped log file
Shows success/error dialog with option to view log

Equivalent command line:

# Non-logP protocol
t3_gencsv --regi "<selected_regi>" --pka "<selected_pka>" --protocol fastuvpska --concentration 10 --volume 5 --output "<regi_folder>/<protocol>"

# logP protocol with solvent
t3_gencsv --regi "<selected_regi>" --pka "<selected_pka>" --protocol logp --logp-solvent toluene --output "<regi_folder>/<protocol>"

# With filter file
t3_gencsv --regi "<selected_regi>" --pka "<selected_pka>" --filter-file "<selected_filter>" --protocol <selected_protocol> --concentration <value> --volume <value> --output "<regi_folder>/<protocol>"

Usage:

.\powershell_scripts\run_gencsv.ps1

Notes for Windows Users

All scripts assume the default data location is C:\New PION Data
Scripts use Windows Forms for GUI dialogs (file browsers, message boxes, dropdown menus)
Log files are created with timestamps for tracking execution history
The scripts expect .exe versions of the CLI tools (t3_extract.exe, t3_gencsv.exe) to be available in the system PATH

CSV File Column Requirements and Validation

This section documents the exact column requirements for CSV input files used by t3_gencsv. The validation logic is implemented in t3_chomper/formatters.py.

Registration File Column Requirements

All registration files require these base columns (case-insensitive):

sample - Unique compound/sample identifier (used for joining with pKa file)
well - Well position in the plate
mw - Molecular weight

Note: You can customize the sample column name using --sample-col parameter (default: "sample")

Protocol-specific additional requirements:

Protocol	Additional Required Columns	Notes
`fastuvpska`	None	Only base columns required
`uvmetric`	None	Only base columns required
`phmetric`	`fw`, `mg`	Formula weight and mass (solid samples)
`logp`	`fw`, `mg`	Formula weight and mass (solid samples)

Validation behavior:

Column names are automatically converted to lowercase for matching
Missing required columns will cause the program to exit with an error message
The error message will specify which column is missing and in which file (see formatters.py:94-96, 208-211)

pKa Data File Column Requirements

The pKa data file always requires:

sample - Compound/sample identifier (must match registration file, or use --sample-col to specify a different column name)

The pKa file can be provided in two formats, which are automatically detected:

Format 1: Short Format (Preferred)

sample - Sample identifier
reformatted_pkas - Comma-separated string in the format: "TYPE1,value1,TYPE2,value2,..." where TYPE is either ACID or BASE
Example: "ACID,2.86,BASE,9.64"

Format 2: Long Format (Automatically Converted)

If the reformatted_pkas column is not present, the code attempts to convert from long format.

Required columns for long format:

sample - Sample identifier (or custom name via --sample-col)
pka_value - Numeric pKa value
pka_type - Type of pKa, either "acid" or "base" (case-insensitive)

Conversion behavior:

Long format will be automatically converted to short format (see formatters.py:119-123)
Multiple pKa values for the same compound should appear on separate rows
Converted values are sorted by ascending pKa value
Types are converted to uppercase in the output string
The conversion is done by convert_long_pka_df() function (formatters.py:13-51)

Validation behavior:

Missing required columns in long format conversion will raise a ValueError (formatters.py:35-38)
Rows with missing pKa data after merging with registration file are dropped with a warning (formatters.py:131-138)
The warning includes the sample IDs that were dropped

Filter File Column Requirements (Optional)

If using the --filter-file option to limit which samples are processed:

Required column:

sample - Sample identifiers to include (or custom name via --sample-col)

Validation behavior:

Only samples present in the filter file will be processed from the registration file (formatters.py:106)
If no matches are found between the filter file and registration file, the program exits with an error (formatters.py:108-111)
A log message reports how many rows remain after filtering (formatters.py:112)

General Validation Notes

Case insensitivity: All column name matching is case-insensitive. Files are read and column names are immediately converted to lowercase (formatters.py:76, 100, 115, 206)
Column name customization: The sample ID column name defaults to "sample" but can be changed via --sample-col parameter
Merging logic: The registration and pKa files are joined using a left join on the sample column, meaning:
- All rows from the registration file are kept
- Rows without matching pKa data receive null values
- Null pKa rows are then dropped with a warning
Automatically added columns: The following columns are automatically added to the merged dataframe:
- concentration_mm - Sample concentration in mM (from --concentration parameter, default: 10.0)
- volume_ul - Sample volume in µL (from --volume parameter, default: 5.0)
- These columns will have the same value for all records
Error messages: The code provides informative error messages that specify:
- Which column is missing
- Which file has the problem
- For filter files: whether no matches were found
Default column names: When using convert_long_pka_df() programmatically, the default parameter names are:
- id_col="sample"
- pka_col="pka_value"
- pka_type_col="pka_type"
- reformatted_pka_col="reformatted_pkas"

Code References

Registration file validation: formatters.py:87-96
pKa file validation: formatters.py:114-123
Filter file validation: formatters.py:99-112
Protocol-specific requirements: formatters.py:192-218, 337-341, 372-376
Long-to-short format conversion: formatters.py:13-51

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
dev		dev
powershell_scripts		powershell_scripts
t3_chomper		t3_chomper
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

T3 chomper

WARNING HIGHLY WIP

Instructions

Installation

Parsing data files

Generating experiment imports

Short pKa data file format:

Long pKa data file format:

Filtering/Limiting Entries

Available experiment layouts

Examples

Optional Parameters

`parse_t3r_filenames.ps1`

`run_extract.ps1`

`run_gencsv.ps1`

Notes for Windows Users

CSV File Column Requirements and Validation

Registration File Column Requirements

pKa Data File Column Requirements

Format 1: Short Format (Preferred)

Format 2: Long Format (Automatically Converted)

Filter File Column Requirements (Optional)

General Validation Notes

Code References

About

Uh oh!

Releases 6

Packages

Languages

License

octantbio/t3-chomper

Folders and files

Latest commit

History

Repository files navigation

T3 chomper

WARNING HIGHLY WIP

Instructions

Installation

Parsing data files

Generating experiment imports

Short pKa data file format:

Long pKa data file format:

Filtering/Limiting Entries

Available experiment layouts

Examples

Optional Parameters

parse_t3r_filenames.ps1

run_extract.ps1

run_gencsv.ps1

Notes for Windows Users

CSV File Column Requirements and Validation

Registration File Column Requirements

pKa Data File Column Requirements

Format 1: Short Format (Preferred)

Format 2: Long Format (Automatically Converted)

Filter File Column Requirements (Optional)

General Validation Notes

Code References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

`parse_t3r_filenames.ps1`

`run_extract.ps1`

`run_gencsv.ps1`

Packages