Skip to content

Mismatching in labels of clusters and transition matrix #83

@martanit

Description

@martanit

Describe the bug

In SOAPify/Examples/LENS.ipynb, tmat labels are given incorrectly when the clusters assigned by KMeans are not in order (e.g.: [C0=0, C2=2, C1=1]).
The output of calculateTransitionMatrix is a matrix with columns and rows corresponding to ordered clusters (e.g. for columns: C0=0 in col 0, C1=1 in col 1, C2=2 in col 2 ...) while the label assignment is given depending on the cluster order (C0 for col 0, C2 for col 1, C1 for col 2).
The problem is fixed by sorting the labels, from:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in minmax]
)

to:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in np.sort(minmax, axis=0)]
)

To reproduce the bug, changing the random_state parameter in KMeans (and thus the cluster assignment order) changes the exchanging probabilities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions