In the luxury jewelry industry, pricing and stock management are high-stakes operations. This project applies Inferential Statistics to a dataset of 5,000 diamonds to provide a leading retailer with data-backed ranges for market pricing and expected inventory shipments. By moving beyond simple averages to Confidence Intervals, the business can quantify risk and optimize profitability.
📗 Google Sheet Link: Access the full interactive analysis here
Confidence Intervals for Means (Continuous Data)
The pricing team needs to set competitive yet profitable prices. Relying on a single "average" price is risky due to high variance in the market. We need to establish a 95% confidence range for the true mean price of diamonds overall, as well as for specific quality segments like "Premium" and "Fair" cuts.
-
Point Estimation: Calculated the sample mean (
$\bar{x}$ ) and standard deviation ($s$ ) for the total population and specific segments. - Define Confidence: Set a 95% Confidence Level, utilizing a critical Z-score of 1.96.
-
Quantify Error: Calculated the Margin of Error using the formula
$z \times (s / \sqrt{n})$ . -
Establish Bounds: Created the lower and upper limits (
$\bar{x} \pm \text{Margin of Error}$ ) to define the "fair market value" range.
| Segment | Sample Statistics | 95% Confidence Interval | |||||
|---|---|---|---|---|---|---|---|
| x̄ | s | n | z-score | Margin of Error | Lower | Upper | |
| All Diamonds | 3862.4 | 3977.6 | 5000 | 1.96 | 110.25 | 3752.2 | 3972.7 |
| Premium | 4524.1 | 4351.4 | 1305 | 1.96 | 236.09 | 4288.1 | 4760.2 |
| Fair | 4333.6 | 3277.9 | 147 | 1.96 | 529.91 | 3803.7 | 4863.5 |
Confidence Intervals for Proportions (Categorical Data)
The logistics team expects a new shipment and needs to predict the proportion of high-quality vs. low-quality diamonds. Since most sales come from "Premium" or "Ideal" cuts, we must estimate the percentage of these cuts in the population to optimize storage and marketing.
-
Frequency Count: Aggregated the number of "Premium" and "Ideal" diamonds using a combined
COUNTIFlogic. -
Sample Proportion (
$\hat{p}$ ): Determined the ratio of specific cuts against the total sample size ($n=5,000$ ). - Define Confidence: Set a 90% Confidence Level, utilizing a critical Z-score of 1.645 for logistics planning.
-
Proportional Error: Calculated the Margin of Error for proportions using
$z \times \sqrt{\hat{p}(1-\hat{p}) / n}$ . - Predictive Bounds: Defined the percentage range the business can expect for incoming stock.
| Cut Category | Sample Statistics | 90% Confidence Interval | |||||
|---|---|---|---|---|---|---|---|
| Count | n | p̂ | z-score | Margin of Error | Lower | Upper | |
| Premium or Ideal | 3316 | 5000 | 0.6632 | 1.645 | 0.0110 | 0.6522 | 0.6742 |
| Fair | 147 | 5000 | 0.0294 | 1.645 | 0.0039 | 0.0255 | 0.033 |
| Objective | Mathematical Formula | Excel / Google Sheets Implementation |
|---|---|---|
| Margin of Error (Mean) | =z_score * (stdev / SQRT(n)) |
|
| Margin of Error (Proportion) | =z_score * SQRT(p_hat * (1 - p_hat) / n) |
|
| Categorical Count | =COUNTIF(Range,"Premium") + COUNTIF(Range,"Ideal") |
|
| Sample Proportion ( |
=Count_Cell / Sample_Size_Cell |
| Symbol | Meaning |
|---|---|
| z | Z-score corresponding to confidence level |
| s | Sample standard deviation |
| n | Sample size |
| p̂ | Sample proportion |
| Count | Number of observations matching criteria |
-
Precision vs. Sample Size: The "Fair" cut price interval is much wider than the "Premium" interval. This is a direct result of the smaller sample size (
$n=147$ ), showing that our pricing strategy for rare or lower-tier diamonds carries higher statistical risk. -
Supply Chain Stability: We can be 90% confident that at least 65.22% of our inventory will consist of high-demand "Premium/Ideal" diamonds. This confirms that our primary revenue engine is highly predictable.
-
Pricing Recommendation: Use the upper bound of the "Premium" interval ($4,760) for diamonds with high clarity/color scores, while maintaining the lower bound ($4,288) as the baseline for competitive sales.
-
Marketing Focus: Since "Fair" cuts represent less than 3.33% of our likely stock, marketing resources should be heavily allocated toward "Premium/Ideal" stories where our volume and pricing certainty are highest.
Developed as part of the Applied Statistics for Data Analytics course by DeepLearning.AI.
Ayushi Gajendra
Data Analyst | Former EdTech Co-Founder
- 7+ Years of experience in business operations, strategic growth, and entrepreneurial leadership.
- I specialize in bridging the gap between raw data and high-stakes business decisions.
- My goal is to help organizations move beyond "gut feeling" to drive growth through evidence-based strategy.