Hello,
In a dataset of 2.6M records, that contains a set of 28 similar records, Zingg clustered these 28 similar records into 6 different z_cluster IDs.
Using the same model with a larger dataset of 14.5M records, that contains the exact same set of 28 similar records (plus 4, 32 in total), Zingg clustered them into 24 different z_cluster IDs.
Why would zingg do less matches when the number of total records increased (from 2.6M to 14.5M) but the set of same similar records was pretty much the same (28 vs 32)?
Thank you,
Rodrigo Escamilla