Skip to content

Conversation

@iammuze
Copy link
Collaborator

@iammuze iammuze commented Sep 29, 2025

Overview

This PR is good to go as you mentioned, and I’ve added some additional content as per your request.

Key Changes

  • Updated: Content

@iammuze iammuze requested a review from RafaelOsiro September 29, 2025 13:06
@iammuze iammuze self-assigned this Sep 29, 2025
@iammuze iammuze added the documentation Improvements or additions to documentation label Sep 29, 2025
Copy link
Contributor

@RafaelOsiro RafaelOsiro Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the following issues:

  • Line 4: The isReplicaOf check is sunsetting
    Column 9: Should be is being sunset or is being deprecated (grammatically correct form)

  • Line 58: The DataDiff report shows:
    Column 5: Should be Data Diff report (consistent spacing with the rest of the document)

  • Line 164: It works automatically you set it up once
    Column 27: Missing punctuation - should be It works automatically - you set it up once (add dash or em dash)

  • Line 166: It catches problems early before they affect
    Column 31: Missing punctuation - should be It catches problems early - before they affect (add dash or em dash)

  • Line 168: It gives you peace of mind you can trust
    Column 32: Missing punctuation - should be It gives you peace of mind - you can trust (add dash or em dash)

The document is comprehensive and well-written with clear examples. The main issues are minor punctuation inconsistencies in the Key Takeaways section and one terminology issue at the beginning.

Comment on lines 83 to 153
## Real-Life Example: Online Retail Store

Let me walk you through a complete, real-world scenario:

### The Situation

**Sunshine Electronics** is an online store that sells gadgets. Every night at midnight, their system creates a backup copy of all the day's orders. This backup is used for:

- Creating daily sales reports
- Feeding data to their accounting system
- Analyzing customer trends

### The Problem They Had

One morning, the Sales Manager noticed the daily report showed 1,247 orders, but the warehouse had shipped 1,250 packages. **Where did 3 orders go?**

After investigating, they discovered:

- The backup system had a glitch
- Some orders placed between 11:58 PM and midnight weren't copied over
- This had been happening for weeks
- They had been under-reporting revenue and had incorrect inventory counts

### The Solution: Data Diff

They set up Data Diff to automatically compare their main orders database with the backup every morning.

**Here's what they compared:**

**Original Orders Database:**

| Order ID | Customer Name | Product | Amount | Date |
| :--------- | :------------- | :-------- | :------- | :----------- |
| 10001 | Sarah Johnson | Laptop | $899 | Jan 15, 2025 |
| 10002 | Mike Chen | Headphones | $149 | Jan 15, 2025 |
| 10003 | Emily Davis | Tablet | $399 | Jan 15, 2025 |
| ... | ... | ... | ... | ... |
| 10248 | David Lee | Phone Case | $19 | Jan 15, 2025 |
| 10249 | Anna Brown | USB Cable | $12 | Jan 15, 2025 |
| 10250 | Tom Wilson | Mouse | $29 | Jan 15, 2025 |

**Backup Orders Database:**

| Order ID | Customer Name | Product | Amount | Date |
| :--------| :-------------| :-------| :------| :-----|
| 10001 | Sarah Johnson | Laptop | $899 | Jan 15, 2025 |
| 10002 | Mike Chen | Headphones | $149 | Jan 15, 2025 |
| 10003 | Emily Davis | Tablet | $399 | Jan 15, 2025 |
| ... | ... | ... | ... | ... |
| <span class="text-negative">10248</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> |
| <span class="text-negative">10249</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> |
| <span class="text-negative">10250</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> | <span class="text-negative">Missing</span> |

### What Data Diff Discovered

**ALERT GENERATED:**

!!! warning "DIFFERENCE DETECTED!"
- Original Database: 1,250 orders
- Backup Database: 1,247 orders
- Missing Records: 3 orders (IDs: 10248, 10249, 10250)
- Issue: Orders placed after 11:58 PM not copied

**Technical Anomaly Output:**

!!! info
- Anomaly Type: Shape
- Source Records: 1,250
- Target Records: 1,247
- Missing Records: 3 (order_ids: 10248, 10249, 10250)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iammuze I liked this example a lot, but you are not showing how to configure the datadiff within the qualytics platform, you don't need to show the entire form screenshot, but you can try to ilustrate how you setup each field, if it has row identifier, passthrough and etc.

I added an enhancement where we are showing the missing as red. We really need to show as much as we can the anomalies and what has being changed

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each one of the examples that you provided let's create the arcade video showing how to setup each field... Make sure to show the same columns and anomalies as the example too

Comment on lines 170 to 224
## Another Quick Example: Healthcare Clinic

**City Health Clinic** transfers patient appointment data from their scheduling system to their billing system every hour.

**They use Data Diff to check:**

- Patient Name
- Appointment Date
- Doctor Assigned
- Service Type
- Insurance Information

**One day, Data Diff caught this:**

**Scheduling System:**

- Patient: **Robert Martinez**
- Doctor: **Dr. Smith**
- Insurance: **BlueCross Plan A**

**Billing System:**

- Patient: **Robert Martinez**
- Doctor: **Dr. Smith**
- Insurance: **BlueCross Plan B (WRONG!)**

The insurance plan code had changed during transfer. Without Data Diff, they would have billed the wrong insurance company, leading to:

- Claim rejection
- Payment delays
- Frustrated patient
- Extra work for staff

Data Diff caught it immediately, and they fixed it before any claim was submitted.

## Key Takeaways

**Data Diff is like having a careful proofreader** who checks that when you copy important information, nothing goes wrong.

**It works automatically** you set it up once, and it keeps watching your data 24/7.

**It catches problems early** before they affect your reports, decisions, or customers.

**It gives you peace of mind** you can trust that your backup, reports, and transferred data are accurate.

## When Should You Use Data Diff?

Use Data Diff whenever you:

- Copy data from one place to another
- Create backups of important information
- Generate reports from multiple sources
- Transfer data between different systems
- Move data to the cloud
- Export data to partners or vendors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is also good but we need to consider:

  1. Show how users can create the dataDiff within the qualytics platform step by step with explanation of the each field to be filled
  2. schedule system vs billing system need to be a table instead of line by line
  3. The error should be in red informing what changed
  4. Make sure to utilize !!!info or !!!warnings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shindiogawa Done this one, created a table, and marked errors red.

Comment on lines 148 to 153
!!! info
- Anomaly Type: Shape
- Source Records: 1,250
- Target Records: 1,247
- Missing Records: 3 (order_ids: 10248, 10249, 10250)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @iammuze this shape anomaly is not good. We need to provide this a table of the shape anomaly as the same way we show in qualytics. This is going to confuse users, as we are explaning dataDiff so we need to make sure qualytics dataDiff stuff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shindiogawa I’ve added the Arcade section and explained the complete process of how users can perform data difference checks and detect anomalies using the Scan operation.

@iammuze iammuze marked this pull request as draft October 10, 2025 05:16
@iammuze iammuze marked this pull request as ready for review October 28, 2025 12:33
…the Entire Process of Adding and Detecting Anomalies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants