Skip to content

Commit 671fe79

Browse files
NicolasHugagramfort
authored andcommitted
[MRG] Added docstring checks for ensemble module (scikit-learn#11418)
* Added docstring checks for ensemble module
1 parent 2d7567f commit 671fe79

File tree

5 files changed

+689
-285
lines changed

5 files changed

+689
-285
lines changed

sklearn/ensemble/forest.py

Lines changed: 116 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -374,11 +374,12 @@ def feature_importances_(self):
374374
return sum(all_importances) / len(self.estimators_)
375375

376376

377-
# This is a utility function for joblib's Parallel. It can't go locally in
378-
# ForestClassifier or ForestRegressor, because joblib complains that it cannot
379-
# pickle it when placed there.
377+
def _accumulate_prediction(predict, X, out, lock):
378+
"""This is a utility function for joblib's Parallel.
380379
381-
def accumulate_prediction(predict, X, out, lock):
380+
It can't go locally in ForestClassifier or ForestRegressor, because joblib
381+
complains that it cannot pickle it when placed there.
382+
"""
382383
prediction = predict(X, check_input=False)
383384
with lock:
384385
if len(out) == 1:
@@ -584,7 +585,8 @@ class in a leaf.
584585
for j in np.atleast_1d(self.n_classes_)]
585586
lock = threading.Lock()
586587
Parallel(n_jobs=n_jobs, verbose=self.verbose, backend="threading")(
587-
delayed(accumulate_prediction)(e.predict_proba, X, all_proba, lock)
588+
delayed(_accumulate_prediction)(e.predict_proba, X, all_proba,
589+
lock)
588590
for e in self.estimators_)
589591

590592
for proba in all_proba:
@@ -691,7 +693,7 @@ def predict(self, X):
691693
# Parallel loop
692694
lock = threading.Lock()
693695
Parallel(n_jobs=n_jobs, verbose=self.verbose, backend="threading")(
694-
delayed(accumulate_prediction)(e.predict, X, [y_hat], lock)
696+
delayed(_accumulate_prediction)(e.predict, X, [y_hat], lock)
695697
for e in self.estimators_)
696698

697699
y_hat /= len(self.estimators_)
@@ -763,22 +765,6 @@ class RandomForestClassifier(ForestClassifier):
763765
"gini" for the Gini impurity and "entropy" for the information gain.
764766
Note: this parameter is tree-specific.
765767
766-
max_features : int, float, string or None, optional (default="auto")
767-
The number of features to consider when looking for the best split:
768-
769-
- If int, then consider `max_features` features at each split.
770-
- If float, then `max_features` is a fraction and
771-
`int(max_features * n_features)` features are considered at each
772-
split.
773-
- If "auto", then `max_features=sqrt(n_features)`.
774-
- If "sqrt", then `max_features=sqrt(n_features)` (same as "auto").
775-
- If "log2", then `max_features=log2(n_features)`.
776-
- If None, then `max_features=n_features`.
777-
778-
Note: the search for a split does not stop until at least one
779-
valid partition of the node samples is found, even if it requires to
780-
effectively inspect more than ``max_features`` features.
781-
782768
max_depth : integer or None, optional (default=None)
783769
The maximum depth of the tree. If None, then nodes are expanded until
784770
all leaves are pure or until all leaves contain less than
@@ -811,20 +797,27 @@ class RandomForestClassifier(ForestClassifier):
811797
the input samples) required to be at a leaf node. Samples have
812798
equal weight when sample_weight is not provided.
813799
800+
max_features : int, float, string or None, optional (default="auto")
801+
The number of features to consider when looking for the best split:
802+
803+
- If int, then consider `max_features` features at each split.
804+
- If float, then `max_features` is a fraction and
805+
`int(max_features * n_features)` features are considered at each
806+
split.
807+
- If "auto", then `max_features=sqrt(n_features)`.
808+
- If "sqrt", then `max_features=sqrt(n_features)` (same as "auto").
809+
- If "log2", then `max_features=log2(n_features)`.
810+
- If None, then `max_features=n_features`.
811+
812+
Note: the search for a split does not stop until at least one
813+
valid partition of the node samples is found, even if it requires to
814+
effectively inspect more than ``max_features`` features.
815+
814816
max_leaf_nodes : int or None, optional (default=None)
815817
Grow trees with ``max_leaf_nodes`` in best-first fashion.
816818
Best nodes are defined as relative reduction in impurity.
817819
If None then unlimited number of leaf nodes.
818820
819-
min_impurity_split : float,
820-
Threshold for early stopping in tree growth. A node will split
821-
if its impurity is above the threshold, otherwise it is a leaf.
822-
823-
.. deprecated:: 0.19
824-
``min_impurity_split`` has been deprecated in favor of
825-
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
826-
Use ``min_impurity_decrease`` instead.
827-
828821
min_impurity_decrease : float, optional (default=0.)
829822
A node will be split if this split induces a decrease of the impurity
830823
greater than or equal to this value.
@@ -843,6 +836,15 @@ class RandomForestClassifier(ForestClassifier):
843836
844837
.. versionadded:: 0.19
845838
839+
min_impurity_split : float,
840+
Threshold for early stopping in tree growth. A node will split
841+
if its impurity is above the threshold, otherwise it is a leaf.
842+
843+
.. deprecated:: 0.19
844+
``min_impurity_split`` has been deprecated in favor of
845+
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
846+
Use ``min_impurity_decrease`` instead.
847+
846848
bootstrap : boolean, optional (default=True)
847849
Whether bootstrap samples are used when building trees.
848850
@@ -1041,22 +1043,6 @@ class RandomForestRegressor(ForestRegressor):
10411043
.. versionadded:: 0.18
10421044
Mean Absolute Error (MAE) criterion.
10431045
1044-
max_features : int, float, string or None, optional (default="auto")
1045-
The number of features to consider when looking for the best split:
1046-
1047-
- If int, then consider `max_features` features at each split.
1048-
- If float, then `max_features` is a fraction and
1049-
`int(max_features * n_features)` features are considered at each
1050-
split.
1051-
- If "auto", then `max_features=n_features`.
1052-
- If "sqrt", then `max_features=sqrt(n_features)`.
1053-
- If "log2", then `max_features=log2(n_features)`.
1054-
- If None, then `max_features=n_features`.
1055-
1056-
Note: the search for a split does not stop until at least one
1057-
valid partition of the node samples is found, even if it requires to
1058-
effectively inspect more than ``max_features`` features.
1059-
10601046
max_depth : integer or None, optional (default=None)
10611047
The maximum depth of the tree. If None, then nodes are expanded until
10621048
all leaves are pure or until all leaves contain less than
@@ -1089,20 +1075,27 @@ class RandomForestRegressor(ForestRegressor):
10891075
the input samples) required to be at a leaf node. Samples have
10901076
equal weight when sample_weight is not provided.
10911077
1078+
max_features : int, float, string or None, optional (default="auto")
1079+
The number of features to consider when looking for the best split:
1080+
1081+
- If int, then consider `max_features` features at each split.
1082+
- If float, then `max_features` is a fraction and
1083+
`int(max_features * n_features)` features are considered at each
1084+
split.
1085+
- If "auto", then `max_features=n_features`.
1086+
- If "sqrt", then `max_features=sqrt(n_features)`.
1087+
- If "log2", then `max_features=log2(n_features)`.
1088+
- If None, then `max_features=n_features`.
1089+
1090+
Note: the search for a split does not stop until at least one
1091+
valid partition of the node samples is found, even if it requires to
1092+
effectively inspect more than ``max_features`` features.
1093+
10921094
max_leaf_nodes : int or None, optional (default=None)
10931095
Grow trees with ``max_leaf_nodes`` in best-first fashion.
10941096
Best nodes are defined as relative reduction in impurity.
10951097
If None then unlimited number of leaf nodes.
10961098
1097-
min_impurity_split : float,
1098-
Threshold for early stopping in tree growth. A node will split
1099-
if its impurity is above the threshold, otherwise it is a leaf.
1100-
1101-
.. deprecated:: 0.19
1102-
``min_impurity_split`` has been deprecated in favor of
1103-
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1104-
Use ``min_impurity_decrease`` instead.
1105-
11061099
min_impurity_decrease : float, optional (default=0.)
11071100
A node will be split if this split induces a decrease of the impurity
11081101
greater than or equal to this value.
@@ -1121,6 +1114,15 @@ class RandomForestRegressor(ForestRegressor):
11211114
11221115
.. versionadded:: 0.19
11231116
1117+
min_impurity_split : float,
1118+
Threshold for early stopping in tree growth. A node will split
1119+
if its impurity is above the threshold, otherwise it is a leaf.
1120+
1121+
.. deprecated:: 0.19
1122+
``min_impurity_split`` has been deprecated in favor of
1123+
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1124+
Use ``min_impurity_decrease`` instead.
1125+
11241126
bootstrap : boolean, optional (default=True)
11251127
Whether bootstrap samples are used when building trees.
11261128
@@ -1272,22 +1274,6 @@ class ExtraTreesClassifier(ForestClassifier):
12721274
The function to measure the quality of a split. Supported criteria are
12731275
"gini" for the Gini impurity and "entropy" for the information gain.
12741276
1275-
max_features : int, float, string or None, optional (default="auto")
1276-
The number of features to consider when looking for the best split:
1277-
1278-
- If int, then consider `max_features` features at each split.
1279-
- If float, then `max_features` is a fraction and
1280-
`int(max_features * n_features)` features are considered at each
1281-
split.
1282-
- If "auto", then `max_features=sqrt(n_features)`.
1283-
- If "sqrt", then `max_features=sqrt(n_features)`.
1284-
- If "log2", then `max_features=log2(n_features)`.
1285-
- If None, then `max_features=n_features`.
1286-
1287-
Note: the search for a split does not stop until at least one
1288-
valid partition of the node samples is found, even if it requires to
1289-
effectively inspect more than ``max_features`` features.
1290-
12911277
max_depth : integer or None, optional (default=None)
12921278
The maximum depth of the tree. If None, then nodes are expanded until
12931279
all leaves are pure or until all leaves contain less than
@@ -1320,20 +1306,27 @@ class ExtraTreesClassifier(ForestClassifier):
13201306
the input samples) required to be at a leaf node. Samples have
13211307
equal weight when sample_weight is not provided.
13221308
1309+
max_features : int, float, string or None, optional (default="auto")
1310+
The number of features to consider when looking for the best split:
1311+
1312+
- If int, then consider `max_features` features at each split.
1313+
- If float, then `max_features` is a fraction and
1314+
`int(max_features * n_features)` features are considered at each
1315+
split.
1316+
- If "auto", then `max_features=sqrt(n_features)`.
1317+
- If "sqrt", then `max_features=sqrt(n_features)`.
1318+
- If "log2", then `max_features=log2(n_features)`.
1319+
- If None, then `max_features=n_features`.
1320+
1321+
Note: the search for a split does not stop until at least one
1322+
valid partition of the node samples is found, even if it requires to
1323+
effectively inspect more than ``max_features`` features.
1324+
13231325
max_leaf_nodes : int or None, optional (default=None)
13241326
Grow trees with ``max_leaf_nodes`` in best-first fashion.
13251327
Best nodes are defined as relative reduction in impurity.
13261328
If None then unlimited number of leaf nodes.
13271329
1328-
min_impurity_split : float,
1329-
Threshold for early stopping in tree growth. A node will split
1330-
if its impurity is above the threshold, otherwise it is a leaf.
1331-
1332-
.. deprecated:: 0.19
1333-
``min_impurity_split`` has been deprecated in favor of
1334-
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1335-
Use ``min_impurity_decrease`` instead.
1336-
13371330
min_impurity_decrease : float, optional (default=0.)
13381331
A node will be split if this split induces a decrease of the impurity
13391332
greater than or equal to this value.
@@ -1352,6 +1345,15 @@ class ExtraTreesClassifier(ForestClassifier):
13521345
13531346
.. versionadded:: 0.19
13541347
1348+
min_impurity_split : float,
1349+
Threshold for early stopping in tree growth. A node will split
1350+
if its impurity is above the threshold, otherwise it is a leaf.
1351+
1352+
.. deprecated:: 0.19
1353+
``min_impurity_split`` has been deprecated in favor of
1354+
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1355+
Use ``min_impurity_decrease`` instead.
1356+
13551357
bootstrap : boolean, optional (default=False)
13561358
Whether bootstrap samples are used when building trees.
13571359
@@ -1522,22 +1524,6 @@ class ExtraTreesRegressor(ForestRegressor):
15221524
.. versionadded:: 0.18
15231525
Mean Absolute Error (MAE) criterion.
15241526
1525-
max_features : int, float, string or None, optional (default="auto")
1526-
The number of features to consider when looking for the best split:
1527-
1528-
- If int, then consider `max_features` features at each split.
1529-
- If float, then `max_features` is a fraction and
1530-
`int(max_features * n_features)` features are considered at each
1531-
split.
1532-
- If "auto", then `max_features=n_features`.
1533-
- If "sqrt", then `max_features=sqrt(n_features)`.
1534-
- If "log2", then `max_features=log2(n_features)`.
1535-
- If None, then `max_features=n_features`.
1536-
1537-
Note: the search for a split does not stop until at least one
1538-
valid partition of the node samples is found, even if it requires to
1539-
effectively inspect more than ``max_features`` features.
1540-
15411527
max_depth : integer or None, optional (default=None)
15421528
The maximum depth of the tree. If None, then nodes are expanded until
15431529
all leaves are pure or until all leaves contain less than
@@ -1570,20 +1556,27 @@ class ExtraTreesRegressor(ForestRegressor):
15701556
the input samples) required to be at a leaf node. Samples have
15711557
equal weight when sample_weight is not provided.
15721558
1559+
max_features : int, float, string or None, optional (default="auto")
1560+
The number of features to consider when looking for the best split:
1561+
1562+
- If int, then consider `max_features` features at each split.
1563+
- If float, then `max_features` is a fraction and
1564+
`int(max_features * n_features)` features are considered at each
1565+
split.
1566+
- If "auto", then `max_features=n_features`.
1567+
- If "sqrt", then `max_features=sqrt(n_features)`.
1568+
- If "log2", then `max_features=log2(n_features)`.
1569+
- If None, then `max_features=n_features`.
1570+
1571+
Note: the search for a split does not stop until at least one
1572+
valid partition of the node samples is found, even if it requires to
1573+
effectively inspect more than ``max_features`` features.
1574+
15731575
max_leaf_nodes : int or None, optional (default=None)
15741576
Grow trees with ``max_leaf_nodes`` in best-first fashion.
15751577
Best nodes are defined as relative reduction in impurity.
15761578
If None then unlimited number of leaf nodes.
15771579
1578-
min_impurity_split : float,
1579-
Threshold for early stopping in tree growth. A node will split
1580-
if its impurity is above the threshold, otherwise it is a leaf.
1581-
1582-
.. deprecated:: 0.19
1583-
``min_impurity_split`` has been deprecated in favor of
1584-
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1585-
Use ``min_impurity_decrease`` instead.
1586-
15871580
min_impurity_decrease : float, optional (default=0.)
15881581
A node will be split if this split induces a decrease of the impurity
15891582
greater than or equal to this value.
@@ -1602,6 +1595,15 @@ class ExtraTreesRegressor(ForestRegressor):
16021595
16031596
.. versionadded:: 0.19
16041597
1598+
min_impurity_split : float,
1599+
Threshold for early stopping in tree growth. A node will split
1600+
if its impurity is above the threshold, otherwise it is a leaf.
1601+
1602+
.. deprecated:: 0.19
1603+
``min_impurity_split`` has been deprecated in favor of
1604+
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1605+
Use ``min_impurity_decrease`` instead.
1606+
16051607
bootstrap : boolean, optional (default=False)
16061608
Whether bootstrap samples are used when building trees.
16071609
@@ -1765,15 +1767,6 @@ class RandomTreesEmbedding(BaseForest):
17651767
Best nodes are defined as relative reduction in impurity.
17661768
If None then unlimited number of leaf nodes.
17671769
1768-
min_impurity_split : float,
1769-
Threshold for early stopping in tree growth. A node will split
1770-
if its impurity is above the threshold, otherwise it is a leaf.
1771-
1772-
.. deprecated:: 0.19
1773-
``min_impurity_split`` has been deprecated in favor of
1774-
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1775-
Use ``min_impurity_decrease`` instead.
1776-
17771770
min_impurity_decrease : float, optional (default=0.)
17781771
A node will be split if this split induces a decrease of the impurity
17791772
greater than or equal to this value.
@@ -1792,8 +1785,14 @@ class RandomTreesEmbedding(BaseForest):
17921785
17931786
.. versionadded:: 0.19
17941787
1795-
bootstrap : boolean, optional (default=True)
1796-
Whether bootstrap samples are used when building trees.
1788+
min_impurity_split : float,
1789+
Threshold for early stopping in tree growth. A node will split
1790+
if its impurity is above the threshold, otherwise it is a leaf.
1791+
1792+
.. deprecated:: 0.19
1793+
``min_impurity_split`` has been deprecated in favor of
1794+
``min_impurity_decrease`` in 0.19 and will be removed in 0.21.
1795+
Use ``min_impurity_decrease`` instead.
17971796
17981797
sparse_output : bool, optional (default=True)
17991798
Whether or not to return a sparse CSR matrix, as default behavior,

0 commit comments

Comments
 (0)