In this article, we'll walk through statistical significance, what it means for your direct mail campaigns, and how we calculate it.
Statistical Significance 101
If data is statistically significant, it likely resulted due to an actual effect.
Not due to chance.
In ecommerce/retail, statistical significance helps marketers determine whether results from an initiative (such as a direct mail campaign) are attributable to the initiative itself (e.g., the creative, copy, or assets) or merely due to chance.
What Statistical Significance Tells You
Statistical significance is purely about reliability and confidence in the results, not the quality of the results from a business perspective. It tells you:
- How confident you can be that the underlying pattern would show up consistently in similar campaigns
- Whether the campaign results reflect a real pattern or could be due to chance
- The mathematical reliability of your campaign results
What Statistical Significance Does NOT Tell You
Statistical significance says nothing about:
- Whether your campaign was successful or profitable
- Whether the performance difference is large or small
- Whether you should scale or stop the campaign
- Your return on ad spend (ROAS) or incremental ROAS (iROAS)
For a marketing initiative to be considered statistically significant, it needs to have a number of recipients large enough to discount the possibility of positive results just being due to chance/random variation.
- We use a common and robust statistical analysis called a two-proportion Z-test to evaluate this. More on this below.
Requirements for a Campaign to be Considered “Statistically Significant”
Only campaigns that have holdout groups will be evaluated for statistical significance.
Campaigns must have the following requirements:
-
- At least 1000 in treatment group
- At least 1000 in holdout group
- Data that is at least 50% mature (halfway through attribution window)
Why Do We Determine Statistical Significance Based on Conversion Rate?
Conversion rate provides the most reliable foundation for statistical significance testing as it directly measures campaign effectiveness as a percentage of recipients who took action.
Using this standardized metric eliminates variability caused by different campaign sizes and order values, creating a consistent baseline for comparison between treatment and holdout groups.
This approach enables marketers to confidently determine whether performance differences represent genuine marketing effects rather than random variation.
How We Display Statistical Significance In-App
If campaign results are statistically significant, the blue highlighted text will appear with a tool tip. Notes:
- The Statistically significant indicator appears whether your campaign outperformed OR underperformed the holdout. Statistical significance is about confidence in the results, not whether those results are good or bad for your business.
- If you do not see the Statistically significant indicator, it could be because:
- The campaign didn't meet the requirements for analysis.
- The results are not statistically significant.
- For automations: The particular date range does not show statistical significance (other date ranges may),
By hovering over the tool tip, the following text will appear.
- Note: The percentage will vary based on the campaign.
If results are not statistically significant, it means:
- Confidence is less than 90%.
- The results could be random chance. The pattern may not show up consistently in similar campaigns
- The pattern isn't reliable enough to make definitive conclusions
This doesn't mean your campaign failed or succeeded. It means the mathematical confidence isn't high enough to be certain about the pattern.
Calculating Statistical Significance
When a campaign has at least 1000 in the treatment group, 1000 in the holdout group, and is at least 50% mature, we will automatically perform a statistical analysis called a two proportion Z-test on the results.
Why this analysis?
The two proportion Z-test is used to assess the difference between two population proportions. In this case, the proportions are conversion rates, and we are evaluating conversion rates between two independent groups (holdout vs. treatment group).
Formula:
Where:
- p₁ = conversion rate of test group
- p₂ = conversion rate of holdout group
- n₁ = sample size of test group
- n₂ = sample size of holdout group
- p̂ = pooled proportion = (conversions₁ + conversions₂) / (n₁ + n₂)
Formula Breakdown:
1. Numerator (p₁ - p₂)
This measures the difference between the two observed sample proportions.
This measures the difference between the two observed sample proportions.
2. Denominator: √[p̂(1-p̂) (1/n₁ + 1/n₂)]
The denominator estimates the standard error (the standard deviation of the sampling distribution) of the difference between proportions.
The denominator estimates the standard error (the standard deviation of the sampling distribution) of the difference between proportions.
Steps for Calculating Statistical Significance with an Example Campaign
Example Campaign Variables:
- Test Group Size: 2064
- Test Conversions: 75
- Test CVR: 3.63
- Holdout Group Size: 900
- Holdout Conversions: 40
- Holdout CVR: 4.44%
Step 1: Calculate the pooled proportion
- Total conversions = 75 + 40 = 115
- Total sample size = 2,064 + 900 = 2,964
- Pooled proportion (p̂) = 115 / 2,964 = 0.0388
Step 2: Calculate the standard error
- SE = √[0.0388 × (1 - 0.0388) × (1/2,064 + 1/900)]
- SE = √[0.0373 × (0.00048 + 0.00111)]
- SE = √[0.0373 × 0.00159]
- SE = √0.0000593
- SE = 0.0077
Step 3: Calculate the Z-score
- Z = (0.0363 - 0.0444) / 0.0077
- Z = -0.0081 / 0.0077
- Z = -1.05
Step 4: Find the p-value
- For a two-tailed test with Z = -1.05
- p-value = 0.2938 (approximately 0.29)
Step 5: Determine the confidence level
- The p-value is 0.29 (or 29%)
- The confidence level is calculated as (1 - p-value)
- So the confidence level is (1 - 0.29) = 0.71 or 71%
Step 6: Determine Outcome
- Since the confidence level is 71%, then we consider this not statistically significant.