- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2
 
QUA-126: Update "Data Diff" userguide as per the suggestion. #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the following issues:
- 
Line 4: The isReplicaOf check is sunsetting
Column 9: Should be is being sunset or is being deprecated (grammatically correct form) - 
Line 58: The DataDiff report shows:
Column 5: Should be Data Diff report (consistent spacing with the rest of the document) - 
Line 164: It works automatically you set it up once
Column 27: Missing punctuation - should be It works automatically - you set it up once (add dash or em dash) - 
Line 166: It catches problems early before they affect
Column 31: Missing punctuation - should be It catches problems early - before they affect (add dash or em dash) - 
Line 168: It gives you peace of mind you can trust
Column 32: Missing punctuation - should be It gives you peace of mind - you can trust (add dash or em dash) 
The document is comprehensive and well-written with clear examples. The main issues are minor punctuation inconsistencies in the Key Takeaways section and one terminology issue at the beginning.
| ## Real-Life Example: Online Retail Store | ||
| 
               | 
          ||
| Let me walk you through a complete, real-world scenario: | ||
| 
               | 
          ||
| ### The Situation | ||
| 
               | 
          ||
| **Sunshine Electronics** is an online store that sells gadgets. Every night at midnight, their system creates a backup copy of all the day's orders. This backup is used for: | ||
| 
               | 
          ||
| - Creating daily sales reports | ||
| - Feeding data to their accounting system | ||
| - Analyzing customer trends | ||
| 
               | 
          ||
| ### The Problem They Had | ||
| 
               | 
          ||
| One morning, the Sales Manager noticed the daily report showed 1,247 orders, but the warehouse had shipped 1,250 packages. **Where did 3 orders go?** | ||
| 
               | 
          ||
| After investigating, they discovered: | ||
| 
               | 
          ||
| - The backup system had a glitch | ||
| - Some orders placed between 11:58 PM and midnight weren't copied over | ||
| - This had been happening for weeks | ||
| - They had been under-reporting revenue and had incorrect inventory counts | ||
| 
               | 
          ||
| ### The Solution: Data Diff | ||
| 
               | 
          ||
| They set up Data Diff to automatically compare their main orders database with the backup every morning. | ||
| 
               | 
          ||
| **Here's what they compared:** | ||
| 
               | 
          ||
| **Original Orders Database:** | ||
| 
               | 
          ||
| | Order ID | Customer Name | Product | Amount | Date | | ||
| | :--------- | :------------- | :-------- | :------- | :----------- | | ||
| | 10001 | Sarah Johnson | Laptop | $899 | Jan 15, 2025 | | ||
| | 10002 | Mike Chen | Headphones | $149 | Jan 15, 2025 | | ||
| | 10003 | Emily Davis | Tablet | $399 | Jan 15, 2025 | | ||
| | ... | ... | ... | ... | ... | | ||
| | 10248 | David Lee | Phone Case | $19 | Jan 15, 2025 | | ||
| | 10249 | Anna Brown | USB Cable | $12 | Jan 15, 2025 | | ||
| | 10250 | Tom Wilson | Mouse | $29 | Jan 15, 2025 | | ||
| 
               | 
          ||
| **Backup Orders Database:** | ||
| 
               | 
          ||
| | Order ID | Customer Name | Product | Amount | Date | | ||
| | :--------| :-------------| :-------| :------| :-----| | ||
| | 10001 | Sarah Johnson | Laptop | $899 | Jan 15, 2025 | | ||
| | 10002 | Mike Chen | Headphones | $149 | Jan 15, 2025 | | ||
| | 10003 | Emily Davis | Tablet | $399 | Jan 15, 2025 | | ||
| | ... | ... | ... | ... | ... | | ||
| | <span class="text-negative">10248</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | | ||
| | <span class="text-negative">10249</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | | ||
| | <span class="text-negative">10250</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | | ||
| 
               | 
          ||
| ### What Data Diff Discovered | ||
| 
               | 
          ||
| **ALERT GENERATED:** | ||
| 
               | 
          ||
| !!! warning "DIFFERENCE DETECTED!" | ||
| - Original Database: 1,250 orders | ||
| - Backup Database: 1,247 orders | ||
| - Missing Records: 3 orders (IDs: 10248, 10249, 10250) | ||
| - Issue: Orders placed after 11:58 PM not copied | ||
| 
               | 
          ||
| **Technical Anomaly Output:** | ||
| 
               | 
          ||
| !!! info | ||
| - Anomaly Type: Shape | ||
| - Source Records: 1,250 | ||
| - Target Records: 1,247 | ||
| - Missing Records: 3 (order_ids: 10248, 10249, 10250) | ||
| 
               | 
          
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iammuze I liked this example a lot, but you are not showing how to configure the datadiff within the qualytics platform, you don't need to show the entire form screenshot, but you can try to ilustrate how you setup each field, if it has row identifier, passthrough and etc.
I added an enhancement where we are showing the missing as red. We really need to show as much as we can the anomalies and what has being changed
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each one of the examples that you provided let's create the arcade video showing how to setup each field... Make sure to show the same columns and anomalies as the example too
| ## Another Quick Example: Healthcare Clinic | ||
| 
               | 
          ||
| **City Health Clinic** transfers patient appointment data from their scheduling system to their billing system every hour. | ||
| 
               | 
          ||
| **They use Data Diff to check:** | ||
| 
               | 
          ||
| - Patient Name | ||
| - Appointment Date | ||
| - Doctor Assigned | ||
| - Service Type | ||
| - Insurance Information | ||
| 
               | 
          ||
| **One day, Data Diff caught this:** | ||
| 
               | 
          ||
| **Scheduling System:** | ||
| 
               | 
          ||
| - Patient: **Robert Martinez** | ||
| - Doctor: **Dr. Smith** | ||
| - Insurance: **BlueCross Plan A** | ||
| 
               | 
          ||
| **Billing System:** | ||
| 
               | 
          ||
| - Patient: **Robert Martinez** | ||
| - Doctor: **Dr. Smith** | ||
| - Insurance: **BlueCross Plan B (WRONG!)** | ||
| 
               | 
          ||
| The insurance plan code had changed during transfer. Without Data Diff, they would have billed the wrong insurance company, leading to: | ||
| 
               | 
          ||
| - Claim rejection | ||
| - Payment delays | ||
| - Frustrated patient | ||
| - Extra work for staff | ||
| 
               | 
          ||
| Data Diff caught it immediately, and they fixed it before any claim was submitted. | ||
| 
               | 
          ||
| ## Key Takeaways | ||
| 
               | 
          ||
| **Data Diff is like having a careful proofreader** who checks that when you copy important information, nothing goes wrong. | ||
| 
               | 
          ||
| **It works automatically** you set it up once, and it keeps watching your data 24/7. | ||
| 
               | 
          ||
| **It catches problems early** before they affect your reports, decisions, or customers. | ||
| 
               | 
          ||
| **It gives you peace of mind** you can trust that your backup, reports, and transferred data are accurate. | ||
| 
               | 
          ||
| ## When Should You Use Data Diff? | ||
| 
               | 
          ||
| Use Data Diff whenever you: | ||
| 
               | 
          ||
| - Copy data from one place to another | ||
| - Create backups of important information | ||
| - Generate reports from multiple sources | ||
| - Transfer data between different systems | ||
| - Move data to the cloud | ||
| - Export data to partners or vendors | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is also good but we need to consider:
- Show how users can create the dataDiff within the qualytics platform step by step with explanation of the each field to be filled
 - schedule system vs billing system need to be a table instead of line by line
 - The error should be in red informing what changed
 - Make sure to utilize !!!info or !!!warnings
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shindiogawa Done this one, created a table, and marked errors red.
        
          
                docs/checks/data-diff-check.md
              
                Outdated
          
        
      | !!! info | ||
| - Anomaly Type: Shape | ||
| - Source Records: 1,250 | ||
| - Target Records: 1,247 | ||
| - Missing Records: 3 (order_ids: 10248, 10249, 10250) | ||
| 
               | 
          
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @iammuze this shape anomaly is not good. We need to provide this a table of the shape anomaly as the same way we show in qualytics. This is going to confuse users, as we are explaning dataDiff so we need to make sure qualytics dataDiff stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shindiogawa I’ve added the Arcade section and explained the complete process of how users can perform data difference checks and detect anomalies using the Scan operation.
…the Entire Process of Adding and Detecting Anomalies.
Overview
This PR is good to go as you mentioned, and I’ve added some additional content as per your request.
Key Changes