Input file would specify the categories, but for all reads within a category (across multiple files), would search exisiting category-bins to see if any include >75% of kmers then add them to that bin, or make a new category-bin.
This might be too inefficient, but would add a constraint on the saturation within a bin.