Skip to content

Conversation

@moonlanderr
Copy link
Collaborator

<insert pull request description here>


Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

  • All imports are in the first cell?
    • First block of imports are standard libraries
    • Second block are 3rd party libraries
    • Third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
  • All GIS object instantiations are one of the following?
    • gis = GIS()
    • gis = GIS('home') or gis = GIS('pro')
    • gis = GIS(profile="your_online_portal")
    • gis = GIS(profile="your_enterprise_portal")
  • If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
  • If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
  • If the notebook requires working with local data (such as CSV, FGDB, SHP, Raster files), upload the files as items to the Geosaurus Online Org using api_data_owner account and change the notebook to first download and unpack the files.
  • Code simplified & split out across multiple cells, useful comments?
  • Consistent voice/tense/narrative style? Thoroughly checked for typos?
  • All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
  • All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
  • Is your code formatted using Jupyter Black? You can use Jupyter Black to format your code in the notebook.
  • If this notebook showcases deep learning capabilities, please go through the following checklist:
    • Are the inputs required for Export Training Data Using Deep Learning tool published on geosaurus org (api data owner account) and added in the notebook using gis.content.get function?
    • Is training data zipped and published as Image Collection? Note: Whole folder is zipped with name same as the notebook name.
    • Are the inputs required for model inferencing published on geosaurus org (api data owner account) and added in the notebook using gis.content.get function? Note: This includes providing test raster and trained model.
    • Are the inferenced results displayed using a webmap widget?
  • IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @jyaistMap so he can add it to the list for the next deploy.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide provides an overview of how to estimate and mitigate bias using MLModel with backbones such as Random Forest, Gradient Boosting, LightGBM, and other available scikit-learn algorithms.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arcgis.learn module includes MLModel classes to train machine learning and deep learning models on tabular vector data in the form of a feature layer or a spatially enabled dataframe. The MLModel uses machine learning algorithms to train models and allows you to use any regression or classification model from scikit-learn.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this notebook we will demonstrate techniques to first identify bias in models with respect to features or variables that are prone to bias, also known as sensitive features, and practical strategies to reduce unfairness by following appropriate mitigation strategies, ensuring more equitable and unbiased models. By applying these methods, arcgis.learn api developers can create models for tabular data that make fairer decisions across diverse groups and minimize the risk of discrimination. Currently, we support both classification and regression modeling for estimating fairness.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we will start by creating the TabularDataObject using the prepare_tabulardata() method that can be fed into the MLModel, initializing the model, and fitting the model. Refer to the Machine learning and deep learning on tabular data documentation for further details.

Note: Only binary classification and regression modeling are supported for estimating fairness. As such, we must choose a dependent variable with binary classes for classification.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data prepared by the prepare_tabulardata method is ready to be passed to the MLModel method, along with the selected machine learning model for training. Here, for demonstration purposes, the lightgbm.LGBMClassifier model from scikit-learn is passed into the MLModel function, along with its parameters.

First, we import the MLModel framework from arcgis.learn, then we specify the model to be used from scikit-learn and define the necessary parameters as follows:


Reply via ReviewNB

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model report below shows that three of the four indicators have turned green. This suggests that the bias mitigation was successful.


Reply via ReviewNB

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see that except for the large RMSE difference, the report before mitigation suggests that the base model is already quite fair with respect to gender. While the fairness flags being True confirms that the model meets the fairness standards for gender, the large RMSE difference seems to be an issue. Let us see if applying a suitable mitigation strategy can improve it. First, we will define the fairness arguments as follows:


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

@BP-Ent BP-Ent Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regression, the mitigation_type can be set to either grid_search or exponentiated_gradient and the mitigation_constraint can be set to ZeroOneLoss or SquareLoss.


Reply via ReviewNB

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using these arguments, we initiate the regression model with the same algorithm as used for the base model and fit the model.


Reply via ReviewNB

@@ -0,0 +1,1087 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, we check the new fairness score of the mitigated model using the fairness_score function as follows:


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Collaborator

@BP-Ent BP-Ent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested changes made on reviewnb!

@moonlanderr
Copy link
Collaborator Author

@BP-Ent all suggested changes added, pls check, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants