Mixed data type fields in Firestore lead to extra fields downstream in BigQuery #1262
Replies: 6 comments 10 replies
-
|
If mixed STRING and INTEGER types popped up again after previous cleanups, there should be bug(s) saving data in the wrong type. We may need to list out each instance, find root causes (likely in code base) and fix them. Cleaning up mixed STRING and INTEGER types is easy, since integers can be easily converted to strings and vice versa. I agree that mixed STRING and RECORD (object) types are more difficult. We previously discussed this, but didn't reach a consensus on how to do cleanups. Also, we need to find out and fix the root causes of these. |
Beta Was this translation helpful? Give feedback.
-
|
The two biospecimen variables are tied to Blood and Urine Accession IDs. As Warren mentioned, we need to figure out where in code these are being set incorrectly as well as do a mass data correction update. @jacobmpeters can you please explain what the survey specific ones look like in data? Are these nested questions that are supposed to have objects, but some got sent as just strings? The final three lines are a bug that I can clean up. @jacobmpeters for the notifications error code one... are the ones that are strings just instances of empty strings? |
Beta Was this translation helpful? Give feedback.
-
|
Related discussions earlier: issue#938. |
Beta Was this translation helpful? Give feedback.
-
|
From above 'The two biospecimen variables are tied to Blood and Urine Accession IDs. As Warren mentioned, we need to figure out where in code these are being set incorrectly as well as do a mass data correction update.' These are scanned into the clinical biospec dashboard by the sites. They exist independently of us (they are assigned by the health care systems), we are capturing what they scan and need to accept it. The dictionary says they are 'numeric'. I believe the expected behavior is that when the sites scan something that has a leading or trailing character, the dashboard removes it when it stores the data, but I am not certain. We would need someone to help us look at the data and the interface and check what it is doing. Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
@FrogGirl1123 I think we might benefit from assigning someone from the Analytics Team to develop a simple report to detect mixed cases of mixed data type issues as they arise and provide DevOps with the necessary guidance to correct the issues. I think @KELSEYDOWLING7 and I are both too swamped at the moment. |
Beta Was this translation helpful? Give feedback.
-
|
Let's assign @hullingsag. Autumn, please work with Jake to create a weekly report that DevOps can use to correct this issue. @jacobmpeters I don't have the ability to do more thank tag Autumn, so please assign her. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
There is a recurring issue that comes up in which fields in Firestore sometimes allow entries of differing data types.
For example, if the data look something like this in Firestore...
we get a complex table like this...
..instead of something clean like this..
This gets particularly difficult to work with for nested structs.
This SQL query finds all examples of these issues in production:
This is a CSV file with the results: https://nih.app.box.com/folder/312339347849
I would like to correct these mixed data fields in Firestore directly rather than fixing them as they appear in BigQuery. It would also be great to impose some sort of type checking on these fields so that these issues don't keep popping up and unexpectedly disrupting our analytic workflows.
Does anyone have a good idea about the source of these issues and why they continue to arise? @we-ai @anthonypetersen @JoeArmani
Cleaning these fields up is a priority for our PR2 investigator-facing data warehouse work.
Beta Was this translation helpful? Give feedback.
All reactions