GitHub - JoyKyalogit/Sales-Performance-Decision-Making-Using-Statistics

SALES PERFORMANCE & DECISION MAKING USING STATISTICS

End-to-End Statistics Project Report

1. INTRODUCTION (Business Context)

The company operates both online and physical retail stores across multiple regions. Management seeks to understand overall sales performance, assess the reliability of insights derived from data, and determine whether a marketing campaign leads to higher revenue per transaction. This analysis uses pure statistical methods, including descriptive statistics, sampling theory, probability laws, and hypothesis testing, to support data-driven decision-making.

2. PART 1: DESCRIPTIVE STATISTICS

2.1 Measures of Central Tendency)

Monthly revenue was calculated by aggregating daily transaction revenue by month. The following measures were computed:

Mean monthly revenue
Median monthly revenue
Mode monthly revenue

Interpretation

The mean represents the overall average monthly revenue but is sensitive to extreme values such as promotional spikes or unusually large orders. The median represents the middle value of monthly revenue and is more robust to outliers. The mode is not meaningful in this case because revenue is a continuous variable, and exact repeated values are rare.

Statistical Decision

The median is the most appropriate measure to represent typical monthly revenue because it is less affected by extreme high-revenue months.

2.2 Measures of Dispersion

The following dispersion measures were calculated:

Range
Variance
Standard deviation

Interpretation

The standard deviation measures how far monthly revenue values deviate from the mean. A relatively high standard deviation indicates that revenue varies significantly from month to month.

Statistical Decision

High dispersion suggests sales instability, likely driven by seasonality, promotions, and marketing campaigns.

2.3 Shape of Distribution

A histogram of monthly revenue was plotted.

Findings

The distribution is negatively skewed (left-skewed). Most months cluster around moderate revenue levels. A few months have very high revenue values. Skewness and kurtosis calculations confirm:

Negative skewness
High kurtosis, indicating the presence of extreme values

Statistical Decision

Because revenue is not normally distributed, median-based summaries are preferred over mean-based summaries for reporting typical performance.

3. PART 2: DATA VISUALIZATION

3.1 Line Chart – Revenue Over Time

The line chart shows:

Overall revenue trends
Seasonal patterns
Revenue spikes during certain periods

Insight: Revenue is not constant over time and shows periods of rapid growth and decline.

3.2 Bar Chart – Revenue by Store Type

The bar chart compares total revenue from:

Online stores
Physical stores

**Insight:**Physiscal stores generate higher total revenue, suggesting stronger scalability or higher transaction volume.

3.3 Box Plot – Revenue by Region

The box plot highlights:

Differences in revenue distribution across regions
Presence of outliers in certain regions

Insight: Some regions are more volatile, while others show stable revenue patterns.

3.4 Scatter Plot – Marketing vs Revenue

The scatter plot shows a positive association between marketing activity and revenue.

Insight: Transactions during marketing campaigns tend to generate higher revenue, motivating formal hypothesis testing.

4. PART 3: SAMPLING & BIAS

4.1 Population vs Sample

Population: All sales transactions made by the company across all stores and regions

Sample: The 3-year transaction dataset used in the analysis

4.2 Sampling Bias

If only urban stores were sampled, the analysis would suffer from selection bias.

Effect on Conclusions

Revenue would likely be overestimated. Rural consumer behavior would be ignored.

Statistical Decision

A stratified random sampling method should be used, ensuring representation from both urban and rural regions.

5. PART 4: LAW OF LARGE NUMBERS & CENTRAL LIMIT THEOREM

5.1 Law of Large Number

Sample means were calculated using increasing sample sizes (n = 10, 50, 100, 500).

Observation

As sample size increases, the sample mean converges toward the population mean.

Statistical Decision

Larger samples produce more reliable estimates of average revenue.

5.2 Central Limit Theorem

Two hundred samples of size n = 30 were drawn, and their means were plotted.

Observation

The distribution of sample means approximates a normal distribution, despite the original revenue being skewed.

Statistical Decision

This justifies the use of t-tests for inference.

6. PART 5: HYPOTHESIS TESTING

Business Question

Does running a marketing campaign increase average revenue per transaction?

6.1 Hypotheses

Null Hypothesis (H₀): Mean revenue (campaign) = Mean revenue (no campaign)

Alternative Hypothesis (H₁): Mean revenue (campaign) > Mean revenue (no campaign)

This is a one-tailed test with:

Confidence level = 95%
Alpha = 0.05

6.2 Statistical Tes

An independent samples t-test was conducted.

Result

p-value < 0.05

Statistical Decision

Reject the null hypothesis.

Interpretation

There is statistically significant evidence that marketing campaigns increase average revenue per transaction.

7. PART 6: ERRORS & INTERPRETATION

Type I Error

Concluding that marketing campaigns increase revenue when they actually do not.

Business Impact: Unnecessary marketing expenditure.

Type II Error

Failing to detect a real increase in revenue due to marketing.

Business Impact: Missed growth opportunities.

8. PART 7: EFFECT SIZE & POWER

8.1 Effect Size

Cohen's d indicates a medium to large effect size.

Interpretation

The marketing campaign effect is not only statistically significant but also practically meaningful.

8.2 Power Discussion

A statistically insignificant result could still matter if:

Sample size is small
Revenue variability is high

Statistical Decision

Collecting more data would increase statistical power and confidence in decisions.

9. BUSINESS RECOMMENDATIONS

Use median revenue for performance reporting
Plan for revenue volatility caused by promotions and seasonality
Continue and scale marketing campaigns
Monitor ROI to control marketing costs
Improve sampling methods to avoid bias
Collect more data for long-term strategic planning

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
statistics.ipynb		statistics.ipynb
statistics_data.csv		statistics_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages