Skip to content

Group/cluster size decreases as data increases although same similar records number remains about the same #1239

@rodrigo748

Description

@rodrigo748

Hello,

In a dataset of 2.6M records, that contains a set of 28 similar records, Zingg clustered these 28 similar records into 6 different z_cluster IDs.
Using the same model with a larger dataset of 14.5M records, that contains the exact same set of 28 similar records (plus 4, 32 in total), Zingg clustered them into 24 different z_cluster IDs.
Why would zingg do less matches when the number of total records increased (from 2.6M to 14.5M) but the set of same similar records was pretty much the same (28 vs 32)?

Thank you,
Rodrigo Escamilla

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions