This article covers the following:
- Overview
- The Five Levels of Statistical Inference
- How the Statistical Model Impacts Your Report
- FAQs
Overview
Wingify campaign reports surface statistical evidence to support decisions about which variation to roll out or disable. The statistical outputs your report displays depend on the statistical model configured for the campaign: Bayesian or Frequentist. Both models work through the same logical hierarchy from raw data to a final decision, but they express uncertainty and significance differently.
The Five Levels of Statistical Inference
The statistical engine underlying Wingify reports breaks the path from raw data to a decision into five levels. Understanding these levels helps you read any report, regardless of which statistical model is active.
Level 0: Empirical Data
The base level is the raw data from your campaign: how many visitors were assigned to each variation, how many conversions occurred, and the total metric value for continuous metrics like revenue. This data contains no uncertainty; it is a direct count of observed events.
For binary metrics, for example, conversion rate, add-to-cart rate, the data is labeled Unique Conversions/Visitors. For continuous metrics, for example, revenue per visitor, it is labeled Total value (Unique Conversions/Visitors).
Level 1: Expected Average
Wingify uses the empirical data to project the likely range of true averages for each variation. These projections are expressed as distributions, not fixed numbers, because any finite sample carries measurement uncertainty. Wingify models these as normal distributions.
In Bayesian reports, this column is labeled Expected Conversion Rate (binary metrics) or Expected value per visitor (continuous metrics). In Frequentist reports, the equivalent is Conversion Rate (v) with a confidence interval shown below the point estimate.
Level 2: Expected Improvement
This level calculates the difference between the variation's and the baseline's projected average distributions. The result is an Improvement Distribution that captures both the direction and magnitude of the likely effect.
In Bayesian campaigns, the improvement is shown as a box plot of the improvement posterior in the detail view of the report table. In Frequentist campaigns, it appears as Improvement % (v) with confidence intervals.
Note: Box plot graphs are available for Bayesian campaigns using the Classic Statistical Engine only. They are not displayed for campaigns using the Enhanced Statistical Engine. For Frequentist campaigns, improvement is shown as a percentage with a range.
Level 3: Probability of Improvement
At this level, Wingify infers from the Improvement Distribution the probability that the true improvement exceeds the Region of Practical Equivalence (ROPE). ROPE defines the minimum improvement that would be practically meaningful.
In Bayesian reports, this column is labeled Decision Probabilities or Probability to be Better. In Frequentist reports, the equivalent is Significance Level, which represents the complement of the p-value interpreted as evidence against the null hypothesis.
Level 4: Decisions
Wingify applies a threshold to the Level 3 metric to determine a winner. The threshold is controlled by the False Positive Rate (FPR) in both Bayesian and Frequentist campaigns. The difference lies only in what the metric is called:
Bayesian: the metric is called Probability to be Better
Frequentist: the metric is called Significance Level
When a variation crosses the winner threshold (typically 95%), Wingify declares it better than the baseline and displays a recommendation banner. When a variation falls below the lower threshold (typically 5%), Wingify recommends disabling it.
The winner threshold is shown as a dotted line across the probability bars in the Bayesian Statistics view. In Frequentist campaigns, the same threshold appears as the configured FPR, typically set at 10%.
How the Statistical Model Impacts Your Report
The report interface adapts to the configured statistical model. The table below summarizes the key differences in what you see.
| Report Element | Bayesian Model | Frequentist Model |
|---|---|---|
| Significance column | Decision Probabilities (probability bar showing % chance variation is better) | Significance Level (% significance, compared against the FPR threshold) |
| Improvement column | Expected Improvement (distribution with ROPE reference) | Improvement % (v) with confidence interval range |
| Statistics view key metric | Probability to be Better | Significance Level |
| Winner declaration | Probability exceeds Winner Threshold (for example, 95%) | Significance Level exceeds 1 - FPR threshold |
| Recommendation banner language | "[Variation] is better than baseline ... can be expected with 95% probability of being better" | "Stick to [Variation] Baseline as no variation shows the potential to outperform the baseline" |
Note: To customize which columns appear in each view, or to create your own custom views, click the pencil icon ✏ to the right of the view tabs. For more information and detailed steps, see Navigate and Customize Your Campaign Report.
The five-level statistical hierarchy, from raw data through expected averages, improvement distributions, probability estimates, and final decisions, applies to both statistical models. What changes between Bayesian and Frequentist campaigns is the language and visual form of the outputs at Levels 3 and 4. The Bayesian model gives you a continuously updated probability estimate; the Frequentist model gives you a significance level relative to a null hypothesis. Both are designed to help you make reliable, data-backed decisions while minimizing the risk of acting on random fluctuations.
To ensure your report reflects reliable data, see Achieve Accurate Campaign Results with SmartStats Configuration.
FAQs
What is the difference between "Probability to be Better" and "Significance Level"?
Probability to be Better is a Bayesian metric expressing the direct probability that the variation outperforms the baseline. Significance Level is a Frequentist metric expressing the statistical evidence against the null hypothesis that there is no difference. Both serve as the primary decision indicator in their respective frameworks.
Why do some variations show "Collecting Data" in the probability column?
Wingify requires at least 500 visitors and at least one conversion on the baseline before it can compute reliable statistics. Until that threshold is met, the probability columns display Collecting Data to indicate the result is not yet available. You can configure this limit using Observatory Mode. For more information, see Configure Observatory mode in Your Testing Campaigns.
What does "No data yet" mean in a variation row?
No data yet means the variation has not received any visitors or conversions during the selected date range. This typically affects variations added late to a campaign or those that received zero traffic allocation.
Can I switch between Bayesian and Frequentist on a running campaign?
No, you cannot switch it for a running campaign.
What is ROPE and how does it affect the probability of improvement?
ROPE (Region of Practical Equivalence) defines the range of improvement values considered too small to be practically meaningful. Only improvement that exceeds the ROPE boundary counts toward the probability of improvement calculation. This prevents declaring winners based on trivially small uplifts.
Need more help?
For more information or further assistance, contact Wingify Support.