Retail data often comes in weekly panel format with uneven performance across locations. This project mirrors real-world business experimentation where you need to compare promotions while accounting for store-level differences and repeated measures. It gave me the opportunity to demonstrate proper handling of clustered data, time structure, and model robustness.
Before estimating the effect of each promotion on weekly sales, I conducted a series of data checks to ensure the validity of the analysis.
The dataset included four weeks of sales data per store, with each store randomly assigned to one of three promotional conditions. The unit of analysis was the store-week, resulting in repeated observations per store. To account for this, I used clustered standard errors at the store level in all regression models.
To confirm that the promotion assignment was successfully randomized, I checked:
I explored the relationship between potential covariates and weekly sales:
Using the interquartile range (IQR) method, I identified and removed extreme sales values within each promotion group. This reduced the influence of outliers and improved the symmetry of the outcome distribution.
To account for right-skew in sales, I applied a log transformation using log1p(SalesInThousands). This stabilized variance and reduced the impact of high-end values on model results.
I ran a series of OLS regression models, progressively adjusting for:
I tested the impact of three promotional strategies on weekly sales using a series of OLS regression models. Models included controls for market size and store age, and used clustered standard errors to account for repeated measures within each store. Results were evaluated with and without outliers, and using both raw and log-transformed versions of the outcome variable.
In the final model using log-transformed sales without outliers:
Results were consistent across:
Plots of weekly sales over time confirmed that promotion effects were stable across the 4-week period. Histograms and boxplots showed a right-skewed distribution of sales, justifying the use of a log transformation. Outliers identified via the IQR method were primarily concentrated in smaller markets and higher sales values.