-
Notifications
You must be signed in to change notification settings - Fork 1
Description
In the get_label function, the imbricated loop starts at 1 and ends at num_nbrs included
The loop's only goal is to compute x_new's (or src in the code) target value which is based on the target values of its parent points.
As described in #1, the points used in the loop already have no guarantee of being x_new's parent points.
On top of that, the computation completely omits THE nearest neighbor. Indeed, since x_new is generally not in X_min, when computing the distances between x_new and X_min, there are very few chances that the sorted list of distances has value of 0 as the first element. In fact, not only would the distance not be 0, it probably IS a parent point in most cases and being the NEAREST parent point, it should contribute greatly to the value of x_new target value... ?
When I say that there are very few chances of x_new being in X_min, with k=3, it would require random.uniform(0, alpha) to return 0 or 1, 3 times in a row.
src = X_aug[i]
distances = np.linalg.norm(X_min - src, axis=1)
dist_indices_sorted = np.argsort(distances)
# On recalcule les distances avec les k voisins les plus proches. du pt original
numerator = 0
denom = 0
# Pour tous les knn.
for nbr_indx in range(1, num_nbrs + 1): #
# Check le label du pt_dst et sa distance
y_nbr = y_min[dist_indices_sorted[nbr_indx]]
dist_nbr = distances[dist_indices_sorted[nbr_indx]]
# band-aid code
if dist_nbr == 0:
dist_nbr = alpha # What? si tes collé pk prendre en considération les autres valeurs... ?
# Code de labels pondérés.
numerator += (1 / dist_nbr) * y_nbr
denom += (1 / dist_nbr)
I'm sure this is a simple overlook because a similar loop is implemented in oversample() and the first point needs to be discarded in this case.