Automated Integration Test Goldens Update from CI #5930

datacommons-robot-author · 2026-01-23T06:19:45Z

This pull request updates the golden files automatically via Cloud Build. Please review the changes carefully. Cloud Build Log

…1d6-8538-02c952939182)

gemini-code-assist · 2026-01-23T06:20:02Z

Summary of Changes

Hello @datacommons-robot-author, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request automatically updates integration test golden files, reflecting recent changes in the backend's query processing and Statistical Variable (SV) matching algorithms. The updates demonstrate refined query interpretation, including the integration of Large Language Models for certain query types, and adjustments to how SVs are identified and scored. These changes ensure that the integration tests accurately reflect the current system behavior.

Highlights

Automated Golden File Updates: This pull request automatically updates integration test golden files, reflecting recent changes in the system's query processing and Statistical Variable (SV) matching logic.
Query Processing Enhancements: The debug_info.json files show updates to how queries are processed, including changes in query_with_places_removed and query_transformations fields, indicating refined parsing and simplification.
Large Language Model (LLM) Integration: For certain queries, an llm_response object has been introduced in the debug logs, demonstrating the system's use of LLMs to interpret query intent and identify relevant metrics.
Divergent Query Handling: Different queries now exhibit distinct behaviors in SV matching and LLM logging, suggesting specialized processing paths based on the query's structure or content.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates golden files for integration tests.

The changes for the compareobesityvs.poverty test case appear to be an improvement, with better metric detection from the new LLM-based flow. However, there's a potentially problematic value in the debug logs for sv_detection_query_index_types.

More critically, the changes for the numberofpoorhispanicwomenwithphd test case show a significant regression. The new version fails to detect any statistical variables, whereas the previous version was successful. This seems to be caused by an issue in the query processing logic. This regression should be addressed.

gemini-code-assist · 2026-01-23T06:21:19Z

...tion_tests/test_data/detection_api_multivar/numberofpoorhispanicwomenwithphd/debug_info.json

+  "query_with_places_removed": "",
  "sv_matching": {
-    "CosineScore": [
-      0.79858,
-      0.78833,
-      0.77635,
-      0.77406,
-      0.74103,
-      0.73347,
-      0.72283,
-      0.72257,
-      0.69955,
-      0.6995,
-      0.69762,
-      0.69743,
-      0.696,
-      0.69534,
-      0.69452,
-      0.69396,
-      0.69354,
-      0.69084,
-      0.6907,
-      0.68967,
-      0.68895,
-      0.68862,
-      0.68633,
-      0.68592,
-      0.68275,
-      0.68221,
-      0.68156,
-      0.67987,
-      0.67686,
-      0.67631,
-      0.67515,
-      0.6745,
-      0.67257,
-      0.6718,
-      0.66917,
-      0.669,
-      0.66822,
-      0.66781,
-      0.66507,
-      0.66487
-    ],
-    "MultiSV": {
-      "Candidates": [
-        {
-          "AggCosineScore": 0.8315,
-          "DelimBased": false,
-          "Parts": [
-            {
-              "CosineScore": [
-                0.83118,
-                0.831,
-                0.82552,
-                0.81955,
-                0.81456,
-                0.80907,
-                0.80864,
-                0.80744,
-                0.8003,
-                0.79858,
-                0.78782
-              ],
-              "QueryPart": "number of poor hispanic",
-              "SV": [
-                "Count_Person_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-                "Count_Person_HispanicOrLatino",
-                "Count_Person_Female_AbovePovertyLevelInThePast12Months_HispanicOrLatino",
-                "Count_Person_AbovePovertyLevelInThePast12Months_HispanicOrLatino",
-                "Count_Household_WithoutFoodStampsInThePast12Months_HispanicOrLatino",
-                "Count_Person_Male_AbovePovertyLevelInThePast12Months_BlackOrAfricanAmericanAlone",
-                "Count_Person_Male_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-                "Count_Person_Female_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-                "Count_Person_NoHealthInsurance_HispanicOrLatino",
-                "Count_Person_WithDisability_HispanicOrLatino",
-                "Count_Person_15OrMoreYears_Separated_HispanicOrLatino"
-              ]
-            },
-            {
-              "CosineScore": [
-                0.83183,
-                0.80299
-              ],
-              "QueryPart": "women phd",
-              "SV": [
-                "Count_Person_25OrMoreYears_EducationalAttainmentDoctorateDegree_Female",
-                "Count_Person_25OrMoreYears_Female_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears_Female"
-              ]
-            }
-          ]
-        },
-        {
-          "AggCosineScore": 0.8259,
-          "DelimBased": false,
-          "Parts": [
-            {
-              "CosineScore": [
-                0.83667,
-                0.79727
-              ],
-              "QueryPart": "number of poor",
-              "SV": [
-                "dc/topic/Poverty",
-                "Count_Person_Rural_BelowPovertyLevelInThePast12Months"
-              ]
-            },
-            {
-              "CosineScore": [
-                0.81515,
-                0.7886,
-                0.77756,
-                0.77521
-              ],
-              "QueryPart": "hispanic women phd",
-              "SV": [
-                "dc/06f0jf8xvzw4f",
-                "Count_Person_Female_HispanicOrLatino",
-                "dc/3w039ndqy7qv1",
-                "dc/topic/HispanicOrLatinoFemalePopulationByAge"
-              ]
-            }
-          ]
-        },
-        {
-          "AggCosineScore": 0.8071,
-          "DelimBased": false,
-          "Parts": [
-            {
-              "CosineScore": [
-                0.84959,
-                0.8335
-              ],
-              "QueryPart": "number of poor hispanic women",
-              "SV": [
-                "Count_Person_Female_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-                "Count_Person_Female_HispanicOrLatino"
-              ]
-            },
-            {
-              "CosineScore": [
-                0.7647,
-                0.74767,
-                0.73771,
-                0.73628,
-                0.73059
-              ],
-              "QueryPart": "phd",
-              "SV": [
-                "Count_Person_EducationalAttainmentDoctorateDegree",
-                "Count_Person_25OrMoreYears_EducationalAttainmentDoctorateDegree_Female",
-                "Count_Person_25OrMoreYears_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears",
-                "Count_Person_25OrMoreYears_EducationalAttainmentDoctorateDegree_Male",
-                "Count_Person_25OrMoreYears_Female_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears_Female"
-              ]
-            }
-          ]
-        }
-      ]
-    },
+    "CosineScore": [],
+    "MultiSV": {},
    "Query": "number of poor hispanic women with phd",
-    "SV": [
-      "dc/06f0jf8xvzw4f",
-      "Count_Person_Female_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-      "Count_Person_Female_HispanicOrLatino",
-      "dc/3w039ndqy7qv1",
-      "dc/topic/HispanicOrLatinoFemalePopulationByAge",
-      "dc/9cqv67nn7pn1b",
-      "Median_Age_Person_Female_HispanicOrLatino",
-      "Count_Person_25OrMoreYears_EducationalAttainmentDoctorateDegree_Female",
-      "Count_Person_Female_AbovePovertyLevelInThePast12Months_HispanicOrLatino",
-      "Count_Person_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-      "Count_Person_15OrMoreYears_Widowed_HispanicOrLatino",
-      "Count_Person_15OrMoreYears_Divorced_HispanicOrLatino",
-      "Count_Household_HouseholderRaceHispanicOrLatino_SingleMotherFamilyHousehold",
-      "Count_Person_Female_NotHispanicOrLatino",
-      "Count_Student_HispanicOrLatino",
-      "Count_Person_25OrMoreYears_Female_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears_Female",
-      "dc/0jtctjm33mgh1",
-      "Count_Person_WithDisability_HispanicOrLatino",
-      "Count_Person_15OrMoreYears_MarriedAndNotSeparated_HispanicOrLatino",
-      "Count_Person_HispanicOrLatino_ResidesInCollegeOrUniversityStudentHousing",
-      "Count_Household_WithoutFoodStampsInThePast12Months_HispanicOrLatino",
-      "Count_Person_HispanicOrLatino",
-      "Count_Person_AbovePovertyLevelInThePast12Months_HispanicOrLatino",
-      "Count_Person_Female_BelowPovertyLevelInThePast12Months_TwoOrMoreRaces",
-      "Count_Person_Male_AbovePovertyLevelInThePast12Months_BlackOrAfricanAmericanAlone",
-      "Count_Person_15OrMoreYears_NeverMarried_HispanicOrLatino",
-      "Count_Person_Male_BelowPovertyLevelInThePast12Months_HispanicOrLatino",
-      "Count_Person_Female_BelowPovertyLevelInThePast12Months",
-      "dc/topic/PovertyByGender",
-      "Count_Person_Female_AbovePovertyLevelInThePast12Months_TwoOrMoreRaces",
-      "Count_Person_HispanicOrLatino_ResidesInNursingFacilities",
-      "Count_Person_NoHealthInsurance_HispanicOrLatino",
-      "dc/5hc4etrfyj9qg",
-      "Count_Person_Male_HispanicOrLatino",
-      "Count_Person_Female_BelowPovertyLevelInThePast12Months_WhiteAlone",
-      "dc/hyfn2tlyz48lb",
-      "Count_Person_25OrMoreYears_EducationalAttainmentSomeCollegeLessThan1Year_Female",
-      "Count_Person_25OrMoreYears_EducationalAttainmentBachelorsDegreeOrHigher_Female",
-      "Count_Person_Female_AbovePovertyLevelInThePast12Months_WhiteAlone",
-      "dc/epw58ne8mytn5"
-    ]
+    "SV": []


This change indicates a significant regression for this test case. The previous version correctly identified numerous statistical variables (SVs), but the new version finds none (SV is empty).

The root cause appears to be that query_with_places_removed is now an empty string. With an empty query, it's expected that no SVs would be matched. This suggests a bug in the upstream query processing logic that strips the entire query.

gemini-code-assist · 2026-01-23T06:21:19Z

.../integration_tests/test_data/detection_api_multivar/compareobesityvs.poverty/debug_info.json

      "sv_detection_query_index_types": [
-        "base_uae_mem"
-      ],
-      "sv_detection_query_input": "compare obesity vs poverty",
-      "sv_detection_query_stop_words_removal": "obesity poverty"
+        ""
+      ]


The sv_detection_query_index_types is now [""]. Previously, it was ["base_uae_mem"]. Using an array with an empty string as an index type is ambiguous and could be a bug. If the intention is to not use any embeddings index, it would be clearer to use an empty array []. This might indicate an issue where the embeddings index is not being correctly passed or used in the new LLM-based detection flow.

rohitkumarbhagat · 2026-01-23T11:48:45Z

Surprisingly, re-run did not detect any diff - http://screen/4HR4e7PSUL6SM8A.
Running again for confirmation - http://screen/5n9snXrP2z3F9oM

feat: Update goldens from Cloud Build workflow (build 92f9512c-2b85-4…

c7cced6

…1d6-8538-02c952939182)

datacommons-robot-author added the automated-pr label Jan 23, 2026

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

rohitkumarbhagat requested review from ajaits and clincoln8 January 23, 2026 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Integration Test Goldens Update from CI #5930

Automated Integration Test Goldens Update from CI #5930

datacommons-robot-author commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

rohitkumarbhagat commented Jan 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Automated Integration Test Goldens Update from CI #5930

Are you sure you want to change the base?

Automated Integration Test Goldens Update from CI #5930

Conversation

datacommons-robot-author commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

rohitkumarbhagat commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rohitkumarbhagat commented Jan 23, 2026 •

edited

Loading