-
Couldn't load subscription status.
- Fork 1.1k
fairness guide for tabulardata modeling using mlmodel #2279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arcgis.learn module includes MLModel classes to train machine learning and deep learning models on tabular vector data in the form of a feature layer or a spatially enabled dataframe. The MLModel uses machine learning algorithms to train models and allows you to use any regression or classification model from scikit-learn.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this notebook we will demonstrate techniques to first identify bias in models with respect to features or variables that are prone to bias, also known as sensitive features, and practical strategies to reduce unfairness by following appropriate mitigation strategies, ensuring more equitable and unbiased models. By applying these methods, arcgis.learn api developers can create models for tabular data that make fairer decisions across diverse groups and minimize the risk of discrimination. Currently, we support both classification and regression modeling for estimating fairness.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we will start by creating the TabularDataObject using the prepare_tabulardata() method that can be fed into the MLModel, initializing the model, and fitting the model. Refer to the Machine learning and deep learning on tabular data documentation for further details.
Note: Only binary classification and regression modeling are supported for estimating fairness. As such, we must choose a dependent variable with binary classes for classification.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data prepared by the prepare_tabulardata method is ready to be passed to the MLModel method, along with the selected machine learning model for training. Here, for demonstration purposes, the lightgbm.LGBMClassifier model from scikit-learn is passed into the MLModel function, along with its parameters.
First, we import the MLModel framework from arcgis.learn, then we specify the model to be used from scikit-learn and define the necessary parameters as follows:
Reply via ReviewNB
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model report below shows that three of the four indicators have turned green. This suggests that the bias mitigation was successful.
Reply via ReviewNB
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can see that except for the large RMSE difference, the report before mitigation suggests that the base model is already quite fair with respect to gender. While the fairness flags being True confirms that the model meets the fairness standards for gender, the large RMSE difference seems to be an issue. Let us see if applying a suitable mitigation strategy can improve it. First, we will define the fairness arguments as follows:
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regression, the mitigation_type can be set to either grid_search or exponentiated_gradient and the mitigation_constraint can be set to ZeroOneLoss or SquareLoss.
Reply via ReviewNB
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using these arguments, we initiate the regression model with the same algorithm as used for the base model and fit the model.
Reply via ReviewNB
| @@ -0,0 +1,1087 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, we check the new fairness score of the mitigated model using the fairness_score function as follows:
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes made on reviewnb!
|
@BP-Ent all suggested changes added, pls check, thanks! |
<insert pull request description here>
Checklist
Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.
imports are in the first cell?arcgisimports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.GISobject instantiations are one of the following?gis = GIS()gis = GIS('home')orgis = GIS('pro')gis = GIS(profile="your_online_portal")gis = GIS(profile="your_enterprise_portal")./misc/setup.pyand/or./misc/teardown.py?api_data_owneruser?api_data_owneraccount and change the notebook to first download and unpack the files.<img src="base64str_here">instead of<img src="https://some.url">? All map widgets contain a static image preview? (Callmapview_inst.take_screenshot()to do so)os.path.join()? (Instead ofr"\foo\bar",os.path.join(os.path.sep, "foo", "bar"), etc.)Export Training Data Using Deep Learningtool published on geosaurus org (api data owner account) and added in the notebook usinggis.content.getfunction?gis.content.getfunction? Note: This includes providing test raster and trained model.