Airbnb Booking Analysis

SQL Python Tableau
SQL Code | Python Code | Dashboard

Why This Project?

One of the more challenging aspects of data analysis is answering questions when the available data doesn't perfectly align with what you'd ideally want. I chose to focus on Airbnb bookings because the data presented an interesting analytical challenge: distinguishing between listings that are actively booked and those that are simply unavailable due to owner preferences or part-time rental status. This required thoughtful assumptions, proxy measures, and a careful methodological approach.

Background

Many Airbnb listings rely on recent reviews to attract bookings, but it’s unclear whether newer reviews actually drive more reservations. This project estimates the effect of recent reviews on booking outcomes using observational data from Airbnb.

Data

I used a public Airbnb dataset that includes listing-level variables such as:

I also used calendar-level variables such as:

  • date: Date of potential booking
  • available: True/False indicator for availability
  • Overall, it is important to know that the data is a snapshot in early March.

    Methodology

    Data decisions:

    6 variables were calculated:

    To control for confounding, we used propensity score matching with the following covariates:

    We estimated the average treatment effect on bookings using matched pairs and checked balance using standardized mean differences (SMDs).

    Results

    Covariate Balance

    Before matching, there were substantial differences between listings that recently received a review and those that had not. For example, listings with recent reviews had significantly more prior bookings and reviews overall, and tended to be priced lower:

    Covariate Means Before Matching
    Variable No Recent Review Recent Review
    Prior Booked Days4.668.73
    Total Availability (April 2025)27.1027.99
    Price$475.71$266.97
    Review Score (Rating)4.814.84
    Number of Reviews28.7985.51

    After propensity score matching, these covariates were better balanced across groups. Standardized mean differences (SMD) for all covariates fell below the common threshold of 0.1, suggesting a good match.

    Covariate Means After Matching
    Variable No Recent Review Recent Review
    Prior Booked Days9.028.73
    Total Availability (April 2025)27.6527.99
    Price$267.90$266.97
    Review Score (Rating)4.854.84
    Number of Reviews78.5985.51

    Treatment Effect Estimation

    A logistic regression was used to estimate the effect of recent reviews on the probability of being booked in April 2025 (booked_yes).

    Model Results (Logistic Regression)

    Estimated Risk Difference

    Listings with a recent review had a 15.8 percentage point higher probability of being booked in April 2025 compared to similar listings without a recent review.

    Covariate Balance (Standardized Mean Differences)

    Covariate SMD Treated Mean Control Mean Pooled Std Dev
    Prior Booked Days-0.0508.739.025.78
    Total Availability0.05527.9927.656.28
    Price-0.003266.97267.90292.92
    Review Score (Rating)-0.0304.844.850.27
    Number of Reviews0.06085.5178.59115.18