Skip to content

Commit 62cb426

Browse files
committed
Update readme for new sceripts
1 parent 1a8d815 commit 62cb426

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

scripts/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,3 +69,10 @@ Calculate assembly indices using:
6969
We do not package its source code or its executable with our library, but it can be obtained [on GitHub](https://github.com/croningp/assembly_go) if non-self-referential ground truth is desired.
7070
[`assembly_cpp`] is the current state-of-the-art algorithm by Seet et al. (2024) and was provided to us by its authors on the condition that it remains private and is used only for this ground-truth generation.
7171
Otherwise, a release build of `assembly-theory` is created and used.
72+
73+
### `ring-data.py`
74+
75+
Rings present a challenge to any assembly index algorithm because the number of subgraphs grows quickly with the number of rings. This script will download a file from the [COMPAS project](https://pubs.rsc.org/en/content/articlehtml/2024/cp/d4cp01027b) of Polybenzenoid hydrocarbons (PBHs), these are fused benzene rings. We select molecules with 8 rings or fewer, and save them to mol files. This results in 415 molecules saved to `polycyclic_hydrocarbons` along with metadata.
76+
77+
### `alkene-sampling.py`
78+
See information in about `ring-data` above. This script will generate a complementary data set of molecules that have the same number of bonds and double bonds as the molecules generated by `ring data`. This is a useful control experiment to understand the impact of rings on performance characteristics. These molecules are generated though a hierachical rejection sampler, first sampling randomly labelled trees with a maximum degree of 4, and then sampling edges to convert to double bonds and verifying that labelling does not violate valence rules. The rejection sampling involved is somewhat slow. This results in 415 molecules saved to `data/acyclic_hydrocarbons` along with a csv of metadata.

0 commit comments

Comments
 (0)