add H3, H3Conv and Hyena as model architecture by GuptaVishu2002 · Pull Request #289 · skinniderlab/CLM

GuptaVishu2002 · 2026-02-01T19:15:34Z

This pull request introduces support for the H3, H3Conv, and Hyena model types in both training and sampling scripts, making them configurable through new command-line arguments. It also adds corresponding test coverage and updates dependencies to facilitate these changes.

Model Support and Argument Handling:

Added support for H3, H3Conv, and Hyena models in train_models_RNN.py and sample_molecules_RNN.py, including new arguments (bias, use_fast_fftconv, order, filter_order, inner_factor) to configure these models from the command line.

Testing Enhancements:

Extended test coverage in test_snakemake_steps.py to include dedicated tests for H3, H3Conv, and Hyena models, as well as updating existing tests to use the new arguments.

Dependency Updates:

Updated pyproject.toml to add safari, hydra-core, and pytorch-lightning as dependencies, supporting the new models and configuration management.

Copilot

Pull request overview

Adds support for the H3, H3Conv, and Hyena architectures across the Snakemake-driven training/sampling pipeline, exposing their configuration via new CLI/config parameters and extending the test suite for training.

Changes:

Added H3/H3Conv/Hyena model implementations and wired them into train_models_RNN.py and sample_molecules_RNN.py.
Extended workflow config and Snakemake rules to pass new model parameters (bias, use_fast_fftconv, order, filter_order, inner_factor).
Expanded unit tests to cover training runs for the new model types.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
workflow/config/config_fast.yaml	Documents/configures new model types and their parameters for “fast” runs.
workflow/config/config.yaml	Documents/configures new model types and their parameters for standard runs.
workflow/Snakefile_data	Threads new model parameters through Snakemake CLI invocations for training/sampling.
tests/test_snakemake_steps.py	Adds training tests for H3/H3Conv/Hyena and updates existing calls to include new args.
src/clm/models.py	Adds H3/H3Conv/Hyena model classes using `safari` implementations.
src/clm/commands/train_models_RNN.py	Adds CLI args and model selection branches for H3/H3Conv/Hyena; forwards new params.
src/clm/commands/sample_molecules_RNN.py	Adds CLI args and model selection branches for H3/H3Conv/Hyena; forwards new params.
requirements.txt	Adds `safari` and pins `pytorch-lightning`/`hydra-core`.
pyproject.toml	Adds `safari`, `hydra-core`, and `pytorch-lightning` as dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/clm/commands/train_models_RNN.py

src/clm/commands/sample_molecules_RNN.py

src/clm/models.py

workflow/config/config.yaml

workflow/config/config_fast.yaml

src/clm/models.py

Copilot · 2026-02-01T19:23:18Z

pyproject.toml

    "s4dd @ git+https://github.com/GuptaVishu2002/s4-for-de-novo-drug-design.git@fix-module-library-packaging",
+    "safari @ git+https://github.com/GuptaVishu2002/safari.git@fix-setup",


The dependency safari @ git+https://github.com/GuptaVishu2002/safari.git@fix-setup pulls third-party code directly from a Git repository using a mutable ref (fix-setup), which enables supply-chain attacks if that branch/tag is ever compromised or force-moved. An attacker who gains control of that repo could silently change the code at the same ref and have malicious code executed in any environment that installs this project. To mitigate this, pin Git-based dependencies to immutable commit hashes (or published versions on a trusted index) and periodically update them intentionally, rather than tracking branches/tags.

skinnider

@GuptaVishu2002 see a couple potential issues below

skinnider · 2026-02-16T11:05:52Z

workflow/Snakefile_data

        '--n_ssm {MODEL_PARAMS[n_ssm]} '
        '--n_heads {MODEL_PARAMS[n_heads]} '
        '--exp_factor {MODEL_PARAMS[exp_factor]} '
+        f'{"--bias" if MODEL_PARAMS["bias"] else ""} '


@GuptaVishu2002 can you double-check that boolean arguments are parsed correctly by the existing combination of config.yaml -> Snakefile_data, if you haven't already? i.e., can the user definitely set bias to True or False within train_models_RNN?

skinnider · 2026-02-16T11:10:23Z

src/clm/commands/sample_molecules_RNN.py

-    #     )
+
+    elif model_type == "H3":
+        assert (


@GuptaVishu2002 I think there must be an if conditional: missing here, no? Otherwise, why does the H3 require a heldout file? (Is a conditional H3 even implemented? Maybe the assertion should be the opposite, i.e., for all models other than RNN, assert that conditional is not True)

skinnider · 2026-02-21T19:59:55Z

@GuptaVishu2002 I read through the PR more carefully today and noticed some stuff I hadn’t before. Sorry to not catch it the first time, but I think some points would be good to clarify and others seem like they definitely need to be addressed:

I noticed the new attention-based models (H3, Hyena) are set up with pre-layer norm (e.g., see https://github.com/skinniderlab/CLM/blob/s4/src/clm/models.py#L122) but then don’t have a final layer norm before the output (like the Transformer does: https://github.com/skinniderlab/CLM/blob/s4/src/clm/models.py#L1038). Do you think this should be added? I have the impression it’s standard so that the logit head sees normalized inputs.
The check for empty sequences (e.g. https://github.com/skinniderlab/CLM/blob/s4/src/clm/models.py#L1180) is missing for the sample method of H3 and Hyena
H3, Hyena, S4 and (my bad) Transformer all call .eval() in their sample methods, but the training loop calls print_update at every logging step, which in turn calls the sample method. I think this means dropout is effectively disabled for these four models after the first call to print_update. (Conversely, dropout is never being disabled during sampling for the RNNs, but I think we haven’t noticed because we, or at least I, typically don’t use dropout > 0 with the RNNs). This might also affect early stopping because predictions should be more confident with dropout disabled. So I think we need to add some logic to check if the model was training before the call to sample and, if it was, call train(), e.g., at the top of the method:

was_training = self.training
[... rest of the method ...]
if was_training:
    self.train()

... and I'll create a separate issue to address the RNN.

Args for new models (e.g. n_ssm) don’t have default values so will be None if not specified. Is this desired behavior? The RNN class sets sensible defaults, which I think may be preferable.
Minor, but some things that strike me as potentially confusing nomenclature: is it just me, or does n_heads control three very different parameters for the Transformer, H3, and Hyena models? Is it worth renaming/splitting out three separate params? Also, in S4, n_layers is actually controlling the number of S4 blocks, the number of layers within each being set by the layer_config object within the S4 class (i.e., if the user set n_layers=4, the model would actually have 4 blocks = 12 layers, no?)
Minor, but either there seems to be some dead code in the new models or I’m not understanding something. Specifically, if statements that check the dimensions of the batch (https://github.com/skinniderlab/CLM/blob/s4/src/clm/models.py#L144, https://github.com/skinniderlab/CLM/blob/s4/src/clm/models.py#L136). Neither is in the RNN. Are they necessary? Shouldn’t padded.dim() always be 2, and shouldn’t len(batch) always be 2?

I also noticed a few other things that we can address as separate PRs - will create separate issues for those.

…efault values, update params, remove dead code, correct padding_idx, correct RNN sample

GuptaVishu2002 · 2026-02-22T00:56:47Z

Made the following changes as mentioned above

added LayerNorm for H3 and Hyena at the end (as previously it fed the un-normalised output into output_embedding)
added Empty-sequences guard to H3 and Hyena, similar to other architectures
added was_training pattern for H3, Hyena, S4 and Transformer
default values was derived from the original implementations, can be changed if needed
changed S4: n_layers -> n_blocks, H3: n_heads -> head_dim, Hyena: n_heads -> n_order_heads, as they had different purpose as compared to n_heads argument which was originally used for Transformer
removed dead code check i.e. if padded.dim() == 2 and if len(batch) == 3 … else padded, lengths = batch

For #293, fix padding_idx tensor → integer and remove the padding_t

For #294, can assign max_len in the config file now

For #295, added eval()/train() for RNN classes to solve dropout issue

add H3, H3Conv and Hyena as model architecture

c19a72a

Copilot AI review requested due to automatic review settings February 1, 2026 19:15

Copilot started reviewing on behalf of GuptaVishu2002 February 1, 2026 19:15 View session

run pre-commit

5205237

Copilot AI reviewed Feb 1, 2026

View reviewed changes

Vishu Gupta added 19 commits February 1, 2026 14:42

correct type, remove unnecessary check and variables

fbd52a3

change input type for bias and use_fast_fftconv

f5f355b

modify test suite

4a3bb02

modify test suite

d7db753

modify test suite

89943ac

modify test suites

9bc177c

run pre-compile

cb14b50

modify test suites

5ce37bd

modify test suites

e4eb00d

debug H3, Hyena and possibly fixed H3Conv

ea3be5a

run pre-commit

d5c003b

correct lr param

1d437cc

modify test case

7160355

modify test case

85e30bd

modify test case

c137e9f

debug param

584bde2

debug param

a930573

uncomment H3Conv

a58fdf2

run pre-commit

81f3204

GuptaVishu2002 assigned skinnider Feb 15, 2026

Vishu Gupta added 3 commits February 15, 2026 02:35

change lr param for H3

2752d83

remove H3Conv

28696ca

remove H3Conv

80c1962

skinnider requested changes Feb 16, 2026

View reviewed changes

remove heldout dataset requirement for models other than RNN

1c74176

run pre-commit

7a18b8e

This was referenced Feb 21, 2026

padding_idx is tensor, not integer, for most models #293

Open

Dropout is never disabled during sample() for RNNs #295

Open

Vishu Gupta added 2 commits February 21, 2026 17:45

add final layer norm, check empty sequence, add was_training, check d…

aaa6125

…efault values, update params, remove dead code, correct padding_idx, correct RNN sample

typo

740c08d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add H3, H3Conv and Hyena as model architecture#289

add H3, H3Conv and Hyena as model architecture#289
GuptaVishu2002 wants to merge 28 commits intomasterfrom
s4

GuptaVishu2002 commented Feb 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 1, 2026

Uh oh!

skinnider left a comment

Uh oh!

skinnider Feb 16, 2026

Uh oh!

skinnider Feb 16, 2026

Uh oh!

skinnider commented Feb 21, 2026

Uh oh!

GuptaVishu2002 commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"s4dd @ git+https://github.com/GuptaVishu2002/s4-for-de-novo-drug-design.git@fix-module-library-packaging",
		"safari @ git+https://github.com/GuptaVishu2002/safari.git@fix-setup",

Conversation

GuptaVishu2002 commented Feb 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

skinnider left a comment

Choose a reason for hiding this comment

Uh oh!

skinnider Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

skinnider Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

skinnider commented Feb 21, 2026

Uh oh!

GuptaVishu2002 commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants