NFL Moon Phase Analysis

Setup

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.1

## Warning: package 'tibble' was built under R version 4.5.1

## Warning: package 'purrr' was built under R version 4.5.1

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

meta_2010_2024 <- readRDS("C:/Users/cmor7/OneDrive/Desktop/Projects/nfl_predictions/data/meta_2010_2024.rds")

Does the moon_fraction correlate with spread_outcome?

# Step 1: Filter for games where the spread was missed
spread_miss <- meta_2010_2024 %>% 
  select(game_id, moon_fraction, spread_outcome, total_line_accuracy, 
         total_score, away_mileage, wax_wan) %>% 
  filter(spread_outcome == "Miss")

# Step 2: Count misses by moon fraction thresholds
counts <- spread_miss %>%
  summarise(
    less_than_05 = sum(moon_fraction < 0.5, na.rm = TRUE),
    ge_05 = sum(moon_fraction >= 0.5, na.rm = TRUE),
    less_than_025 = sum(moon_fraction < 0.25, na.rm = TRUE),
    ge_075 = sum(moon_fraction >= 0.75, na.rm = TRUE)
  )

# Display raw counts
print(counts)

##   less_than_05 ge_05 less_than_025 ge_075
## 1         1021  1017           676    690

# Step 3: Chi-Square Test for <50% vs >=50%
chi_50 <- chisq.test(c(counts$less_than_05, counts$ge_05))
print(chi_50)

## 
##  Chi-squared test for given probabilities
## 
## data:  c(counts$less_than_05, counts$ge_05)
## X-squared = 0.0078508, df = 1, p-value = 0.9294

# Step 4: Chi-Square Test for <25% vs >=75%
chi_25_75 <- chisq.test(c(counts$less_than_025, counts$ge_075))
print(chi_25_75)

## 
##  Chi-squared test for given probabilities
## 
## data:  c(counts$less_than_025, counts$ge_075)
## X-squared = 0.14348, df = 1, p-value = 0.7048

# Extract p-values for reporting
p_50 <- round(chi_50$p.value, 4)
p_25_75 <- round(chi_25_75$p.value, 4)

cat("\nChi-square test (<50% vs >=50%): p =", p_50, "\n")

## 
## Chi-square test (<50% vs >=50%): p = 0.9294

cat("Chi-square test (<25% vs >=75%): p =", p_25_75, "\n")

## Chi-square test (<25% vs >=75%): p = 0.7048

Commentary: moon_fraction does not seem to correlate with spread_outcome. I filtered the data for games where the spread set by Las Vegas was inaccurate, a “Miss”, and then counted the number of occurences when the moon phase was less than half and more than half as well as less than quarter and greater than three-quarters. The distrubutions of occurences was roughly equal regardless of moon_fraction.

Does moon_fraction have any correlation with total_score?

score_effect <- meta_2010_2024 %>% 
  select(game_id, moon_fraction, spread_outcome, total_line_accuracy, 
         total_score, away_mileage, wax_wan) %>% 
  filter(total_score >= 54 | total_score < 36) # 36 is 1st qu., 45 is med, and54 is 3rd qu. score per game in dataset

ggplot(score_effect, aes(x=moon_fraction, y=total_score)) + 
  geom_point(size = 2) +  
  geom_smooth(method = "lm", se = TRUE) + 
  labs(  
    title = "Moon Fraction vs Total Score",
    x = "Moon Fraction",
    y = "Total Score"
  ) +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

ggplot(score_effect, aes(x = moon_fraction, y = total_score)) +
  geom_bin2d(bins = 16, color = "white", alpha = 0.8) +  
  scale_fill_gradient(low = "yellow", high = "red") +  
  labs(
    title = "Moon Fraction vs Total Score",
    x = "Moon Fraction",
    y = "Total Score",
    fill = "Count"
  )

Commentary: This chunk examines the relationship between moon fraction and total game scores by filtering for extreme scores (above the 3rd quartile or below the 1st quartile of total scores in the dataset, where 36 is the 1st quartile, 45 median, and 54 the 3rd quartile). It produces two plots: a scatter plot with a linear regression trend line and confidence interval to visualize potential correlations, and a 2D histogram (heat map) showing density of data points at different moon fractions and total scores, using 16 bins per axis with yellow-to-red color gradient. The analysis finds no significant correlation between moon fraction and total score, as evidenced by the flat trend line and random distribution in the plots, consistent with the overall investigation’s conclusion of no moon phase impact on NFL outcomes.

Does moon_fraction correlate with total_line_accuracy?

total_line_miss <- meta_2010_2024 %>% 
  select(game_id, moon_fraction, spread_outcome, total_line_accuracy, 
         total_score, away_mileage, wax_wan) %>% 
  filter(total_line_accuracy >= 9 | total_line_accuracy <= -8) # -8 is 1st qu and  9 is 3rd qu. total line accuracy per game in dataset

ggplot(total_line_miss, aes(x=moon_fraction, y=total_line_accuracy)) + 
  geom_point(size = 2) +  
  geom_smooth(method = "lm", se = TRUE) +  
  labs( 
    title = "Moon Fraction vs Total Line Accuracy",
    x = "Moon Fraction",
    y = "Total Line Accuracy"
  ) +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

ggplot(total_line_miss, aes(x = moon_fraction, y = total_line_accuracy)) +
  geom_bin2d(bins = 16, color = "white", alpha = 0.8) +  
  scale_fill_gradient(low = "yellow", high = "red") + 
  labs(
    title = "Moon Fraction vs Total Line Accuracy",
    x = "Moon Fraction",
    y = "Total Line Accuracy",
    fill = "Count"
  )

Commentary: This chunk investigates moon fraction’s potential correlation with Vegas total line accuracy by isolating extreme inaccuracies (accuracy values above the 3rd quartile or below the 1st quartile, where -8 is the 1st quartile and 9 the 3rd quartile). It generates two visualizations: a scatter plot with points, linear smooth trend line, and confidence intervals; and a 2D histogram heatmap with 16 bins, colored from yellow (low count) to red (high count). The results show no discernible pattern or correlation between moon fraction and total line accuracy, with data points evenly scattered and trend line near horizontal, reinforcing the finding that moon phases do not influence betting line predictions in NFL games.

Does multivariate filtering reveal moon_fraction correlations?

multi_var <- meta_2010_2024 %>% 
  select(game_id, moon_fraction, spread_outcome, total_line_accuracy, 
         total_score, away_mileage, wax_wan) %>% 
  filter(total_score >= 54 | total_score < 36) %>% 
  filter(total_line_accuracy >= 9 | total_line_accuracy <= -8) # 36 is 1st qu., 45 is med, and  54 is 3rd qu. score per game in dataset

ggplot(multi_var, aes(x=moon_fraction, y=total_line_accuracy)) + 
  geom_point(size = 2) +  
  geom_smooth(method = "lm", se = TRUE) +  
  labs(  
    title = "Moon Fraction vs Total Line AND Score",
    x = "Moon Fraction",
    y = "Total Line Accuracy"
  ) +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

ggplot(multi_var, aes(x = moon_fraction, y = total_line_accuracy)) +
  geom_bin2d(bins = 16, color = "white", alpha = 0.8) +  
  scale_fill_gradient(low = "yellow", high = "red") +  
  labs(
    title = "Moon Fraction vs Total Line AND Score",
    x = "Moon Fraction",
    y = "Total Line Accuracy",
    fill = "Count"
  )

Commentary: This multivariate analysis builds on previous chunks by applying both extreme total score filters (as in Chunk 3) and extreme total line accuracy filters (as in Chunk 4) simultaneously to the dataset. It then plots the filtered data: first, a scatter plot of moon fraction vs. total line accuracy with linear trend and confidence band; second, a 2D histogram heatmap with 16 bins. Despite combining filters for more pronounced effects, no meaningful correlations emerge from moon fraction to either total line accuracy or total score, as the plots display random distributions without clear patterns or sloped trends.

Is the distribution of moon_fraction uniform across games?

ggplot(meta_2010_2024, aes(x = moon_fraction)) +
  geom_histogram(bins = 20, fill = "blue", color = "white") +
  labs(title = "Distribution of Moon Fraction Across All Games",
       x = "Moon Fraction", y = "Count") +
  theme_minimal()

Commentary: While there are more occurences of higher/lower scoring games and larger line misses during full and new moons, it was discovered that this is seemingly due to a higer distribution of games occuring during new or full moons during this time period to begin with.

Does statistical testing confirm a lack of moon_fraction correlations?

# Correlation test: moon_fraction vs total_score
cor_test_score <- cor.test(meta_2010_2024$moon_fraction, meta_2010_2024$total_score)
print(cor_test_score)  # Outputs r and p-value

## 
##  Pearson's product-moment correlation
## 
## data:  meta_2010_2024$moon_fraction and meta_2010_2024$total_score
## t = -0.25562, df = 4076, p-value = 0.7983
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03469309  0.02669313
## sample estimates:
##          cor 
## -0.004003747

# Correlation test: moon_fraction vs total_line_accuracy
cor_test_accuracy <- cor.test(meta_2010_2024$moon_fraction, meta_2010_2024$total_line_accuracy)
print(cor_test_accuracy)

## 
##  Pearson's product-moment correlation
## 
## data:  meta_2010_2024$moon_fraction and meta_2010_2024$total_line_accuracy
## t = 0.55369, df = 4076, p-value = 0.5798
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02202720  0.03935539
## sample estimates:
##         cor 
## 0.008672263

# Chi-squared for spread_outcome vs binned moon_fraction
meta_2010_2024 <- meta_2010_2024 %>%
  mutate(moon_bin = cut(moon_fraction, breaks = c(0, 0.25, 0.5, 0.75, 1), 
                        labels = c("New", "Waxing", "Waning", "Full")))
chi_test <- chisq.test(table(meta_2010_2024$spread_outcome, meta_2010_2024$moon_bin))
print(chi_test)  # p-value to check independence

## 
##  Pearson's Chi-squared test
## 
## data:  table(meta_2010_2024$spread_outcome, meta_2010_2024$moon_bin)
## X-squared = 0.76447, df = 6, p-value = 0.993

### Commentary: Further confirmation that there is no correlations between the moon phase and line accuracy or total scores.

Do linear regressions show moon_fraction effects?

# Simple linear model for total_score
lm_score <- lm(total_score ~ moon_fraction + away_mileage + wax_wan, data = meta_2010_2024)
summary(lm_score)  # Check coefficients, R-squared, p-values

## 
## Call:
## lm(formula = total_score ~ moon_fraction + away_mileage + wax_wan, 
##     data = meta_2010_2024)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.169  -9.425  -0.894   8.655  58.981 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   45.6107114  1.2885946  35.396   <2e-16 ***
## moon_fraction  0.0018741  0.6786827   0.003    0.998    
## away_mileage   0.0002049  0.0001507   1.360    0.174    
## wax_wannew     0.9669131  1.6450238   0.588    0.557    
## wax_wanwane   -0.9750857  1.1549307  -0.844    0.399    
## wax_wanwax    -0.1516362  1.1540501  -0.131    0.895    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.9 on 4072 degrees of freedom
## Multiple R-squared:  0.0018, Adjusted R-squared:  0.0005742 
## F-statistic: 1.468 on 5 and 4072 DF,  p-value: 0.1967

# For total_line_accuracy, filtered to extremes
lm_accuracy <- lm(total_line_accuracy ~ moon_fraction + away_mileage + wax_wan, 
                  data = filter(meta_2010_2024, abs(total_line_accuracy) >= 8))
summary(lm_accuracy)

## 
## Call:
## lm(formula = total_line_accuracy ~ moon_fraction + away_mileage + 
##     wax_wan, data = filter(meta_2010_2024, abs(total_line_accuracy) >= 
##     8))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.422 -13.672   7.964  14.422  39.608 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -2.141e+00  2.169e+00  -0.987    0.324
## moon_fraction  1.204e-01  1.144e+00   0.105    0.916
## away_mileage  -9.792e-05  2.559e-04  -0.383    0.702
## wax_wannew    -2.028e+00  2.819e+00  -0.719    0.472
## wax_wanwane    2.210e+00  1.941e+00   1.139    0.255
## wax_wanwax     7.917e-01  1.947e+00   0.407    0.684
## 
## Residual standard error: 17.4 on 2230 degrees of freedom
## Multiple R-squared:  0.003438,   Adjusted R-squared:  0.001204 
## F-statistic: 1.539 on 5 and 2230 DF,  p-value: 0.1745

# Visualize residuals to check model fit
plot(lm_score, which = 1)  # Residuals vs fitted

Commentary: Further confirmation of no trends

Does wax/wane moon phase affect outcomes?

# Faceted scatter plot
ggplot(meta_2010_2024, aes(x = moon_fraction, y = total_score, color = wax_wan)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~ wax_wan) +
  labs(title = "Total Score vs Moon Fraction by Wax/Wane") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

# Boxplot of total_line_accuracy by moon_bin and wax_wan
ggplot(meta_2010_2024, aes(x = moon_bin, y = total_line_accuracy, fill = wax_wan)) +
  geom_boxplot() +
  labs(title = "Total Line Accuracy by Moon Phase Bin and Wax/Wane")

Commentary: This chunk explores the wax/wane phases of the moon as categorical variables (derived from moon_fraction) in relation to game outcomes. The first plot is a faceted scatter plot of total score vs. moon fraction, colored and smooth-lined per wax/wane phase, showing no intersecting trends. The second is a boxplot of total line accuracy grouped by binned moon phase (New, Waxing, Waning, Full) and filled by wax/wane status, revealing symmetric distributions across categories. Overall, no significant associations are detected between wax/wane moon phases and total scores or line accuracies, with boxplots showing similar medians and spread, consistent with the absence of lunar influences on NFL results.

Is moon_fraction associated with overtime games?

# Filter for overtime games and summarize by moon_fraction
overtime_games <- meta_2010_2024 %>% filter(overtime == 1)
summary(overtime_games$moon_fraction)  # Compare mean/median to non-overtime

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.110   0.450   0.465   0.790   1.000

# T-test: Is moon_fraction different in overtime vs non-overtime?
t.test(moon_fraction ~ overtime, data = meta_2010_2024)

## 
##  Welch Two Sample t-test
## 
## data:  moon_fraction by overtime
## t = 1.8181, df = 277.07, p-value = 0.07014
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.003447302  0.086731009
## sample estimates:
## mean in group 0 mean in group 1 
##       0.5066214       0.4649796

# Plot total_score by week, filled by binned moon_fraction (discrete)
ggplot(meta_2010_2024, aes(x = factor(week), y = total_score, fill = moon_bin)) + 
  geom_boxplot() + labs(title = "Total Score by Week, Filled by Binned Moon Fraction")

Commentary: Focusing on overtime games, this code first filters the dataset for overtime occurrences (overtime == 1), summarizes moon fraction statistics (mean, median), and compares them to non-overtime games via a t-test to check for differences. The second plot is a boxplot of total scores by game week, filled by binned moon phase. The summary shows moon fraction distributions in overtime are statistically indistinguishable from regular games (non-significant t-test p-value), and the boxplot indicates no systematic variation in scores based on moon phase bins. This supports the broader finding of no lunar correlations, as overtime occurrences appear random with respect to moon phases.

Do k-means clusters reveal patterns involving moon_fraction?

cluster_data <- meta_2010_2024 %>% select(moon_fraction, total_score, total_line_accuracy) %>% na.omit()
kmeans_result <- kmeans(cluster_data, centers = 3) 
cluster_data$cluster <- factor(kmeans_result$cluster)
ggplot(cluster_data, aes(x = moon_fraction, y = total_score, color = cluster)) + geom_point()

Do normalized plots show moon_fraction effects?

# First, compute density of moon_fraction (for normalization)
moon_density <- meta_2010_2024 %>%
  mutate(moon_bin = cut(moon_fraction, breaks = seq(0, 1, by = 0.0625), include.lowest = TRUE, right = TRUE)) %>%  
  count(moon_bin) %>%
  rename(total_count = n)

# For score_effect: Join and normalize
normalized_score <- score_effect %>%
  mutate(moon_bin = cut(moon_fraction, breaks = seq(0, 1, by = 0.0625), include.lowest = TRUE, right = TRUE)) %>%
  count(moon_bin, total_score) %>%  
  left_join(moon_density, by = "moon_bin") %>%
  mutate(relative_density = n / total_count)

# Plot normalized 2D histogram (use relative_density for fill)
ggplot(normalized_score, aes(x = moon_bin, y = total_score, fill = relative_density)) +
  geom_bin2d(stat = "identity") +
  scale_fill_gradient(low = "yellow", high = "red") +
  labs(title = "Normalized: Moon Fraction vs Total Score",
       x = "Moon Fraction Bin", y = "Total Score", fill = "Relative Density") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Plot moon_fraction by season to see yearly variation
ggplot(meta_2010_2024, aes(x = factor(season), y = moon_fraction)) +
  geom_boxplot() +
  labs(title = "Moon Fraction Distribution by Season", x = "Season", y = "Moon Fraction") +
  theme_minimal()

Commentary: To address potential biases in distribution, this normalizes the earlier score extreme plot by moon fraction density. It creates moon bins of 0.0625 width (matching the 16 bins), counts occurrences per bin and total score, then divides by total counts to get relative density. The tile plot visualizes this normalized density, and a boxplot shows moon fraction variation by season. While revealing some seasonal patterns in moon phase distribution (possibly due to calendar shifts), the normalized plot still shows no concentrated relationships between moon fraction and total scores, as densities are relatively uniform, ruling out lunar effects that were initially suspected.

Is moon_fraction distribution uniform?

# Chi-squared goodness-of-fit test for uniformity
observed <- table(cut(meta_2010_2024$moon_fraction, breaks = seq(0, 1, by = 0.1)))
expected <- rep(sum(observed) / length(observed), length(observed))
chisq.test(observed, p = rep(1/length(observed), length(observed)))

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 802.45, df = 9, p-value < 2.2e-16

Commentary: This performs a chi-squared goodness-of-fit test to check if moon fraction distribution across games is uniform by binning into 10 equal intervals (0 to 1 in 0.1 steps) and comparing observed counts to expected uniform distribution. The test yields a chi-squared value and p-value indicating non-uniformity (p < 0.05 likely), suggesting games are not uniformly distributed across moon phases (consistent with Chunk 6’s histogram). However, as established in previous analyses, this non-uniformity does not translate to meaningful correlations with NFL outcomes like scores or line accuracy, highlighting that while moon phases vary, they exert no practical influence on game results.

Does season start alignment affect moon_fraction distribution?

# Assuming start_date_time is the first game date per season
meta_by_season <- meta_2010_2024 %>%
  group_by(season) %>%
  summarise(start_date = min(start_date_time, na.rm = TRUE))
meta_by_season$moon_at_start <- meta_2010_2024$moon_fraction[match(meta_by_season$start_date, meta_2010_2024$start_date_time)]
ggplot(meta_by_season, aes(x = factor(season), y = moon_at_start)) +
  geom_boxplot() +
  labs(title = "Moon Fraction at Season Start", x = "Season", y = "Moon Fraction")

Commentary: This identifies the earliest game date per season and matches corresponding moon fraction at season start, then boxplots these values by season. The plot shows variability in starting moon fractions across years but no consistent trends, and the boxplots indicate similar distributions without extreme outliers driving the data. This analysis confirms that seasonal alignment of NFL schedules with moon phases does not introduce biases that would affect correlations; instead, the lack of identifiable patterns per season reinforces the conclusion that moon phases are unrelated to NFL performance metrics.

Investigating Moon Phase and Home/Away Advantage

# Define moon phase bins using moon_fraction and wax_wan
meta_2010_2024 <- meta_2010_2024 %>%
  mutate(moon_phase = case_when(
    moon_fraction < 0.05 ~ "New Moon",
    moon_fraction > 0.95 ~ "Full Moon",
    wax_wan == "wax" & moon_fraction < 0.5 ~ "Waxing Crescent",
    wax_wan == "wax" & moon_fraction >= 0.5 ~ "Waxing Gibbous",
    wax_wan == "wan" & moon_fraction > 0.5 ~ "Waning Gibbous",
    TRUE ~ "Waning Crescent"
  ),
  moon_phase = factor(moon_phase, levels = c("New Moon", "Waxing Crescent", "Waxing Gibbous", "Full Moon", "Waning Gibbous", "Waning Crescent")),
  home_win = ifelse(result > 0, 1, ifelse(result < 0, 0, NA))
  )

overall_home_win_rate <- meta_2010_2024 %>%
  filter(!is.na(home_win)) %>%
  summarize(home_win_rate = mean(home_win)) %>%
  pull(home_win_rate)

summary_stats <- meta_2010_2024 %>%
  filter(!is.na(home_win)) %>%
  group_by(moon_phase) %>%
  summarize(
    home_win_rate = mean(home_win),
    avg_score_diff = mean(result),
    n_games = n(),
    se = sqrt(home_win_rate * (1 - home_win_rate) / n_games),
    .groups = 'drop'
  ) %>%
  mutate(
    lower = home_win_rate - 1.96 * se,
    upper = home_win_rate + 1.96 * se
  ) %>%
  add_row(
    moon_phase = "Overall",
    home_win_rate = overall_home_win_rate,
    avg_score_diff = mean(meta_2010_2024$result, na.rm = TRUE),
    n_games = sum(!is.na(meta_2010_2024$home_win)),
    se = NA, lower = NA, upper = NA
  )

print(summary_stats)

## # A tibble: 6 × 7
##   moon_phase      home_win_rate avg_score_diff n_games      se  lower  upper
##   <chr>                   <dbl>          <dbl>   <int>   <dbl>  <dbl>  <dbl>
## 1 New Moon                0.563           2.37     520  0.0217  0.521  0.606
## 2 Waxing Crescent         0.589           2.39     721  0.0183  0.554  0.625
## 3 Waxing Gibbous          0.534           1.75     771  0.0180  0.499  0.570
## 4 Full Moon               0.558           1.93     550  0.0212  0.517  0.600
## 5 Waning Crescent         0.555           2.20    1504  0.0128  0.530  0.580
## 6 Overall                 0.559           2.13    4066 NA      NA     NA

ggplot(summary_stats %>% filter(moon_phase != "Overall"), aes(x = moon_phase, y = home_win_rate)) +
  geom_col(fill = "steelblue", alpha = 0.8) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, color = "black") +
  geom_hline(yintercept = overall_home_win_rate, linetype = "dashed", color = "red", size = 1) +
  geom_text(aes(label = sprintf("%.1f%%", home_win_rate * 100)), vjust = -0.5, color = "white", size = 3.5, fontface = "bold") +
  annotate("text", x = 1, y = overall_home_win_rate + 0.015, label = "Overall: 55.9%", color = "red", hjust = 0, size = 4) +
  labs(title = "Home Win Rate by Moon Phase", x = "Moon Phase", y = "Home Win Rate") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(size = 14, face = "bold")
  )

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

ggplot(meta_2010_2024, aes(x = moon_phase, y = result)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Home Score Differential by Moon Phase", x = "Moon Phase", y = "Home Score Differential (Result)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

logit_model <- glm(home_win ~ moon_phase, data = meta_2010_2024 %>% filter(!is.na(home_win)), family = binomial)
summary(logit_model)

## 
## Call:
## glm(formula = home_win ~ moon_phase, family = binomial, data = meta_2010_2024 %>% 
##     filter(!is.na(home_win)))
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                0.25522    0.08842   2.886   0.0039 **
## moon_phaseWaxing Crescent  0.10651    0.11640   0.915   0.3602   
## moon_phaseWaxing Gibbous  -0.11752    0.11415  -1.030   0.3032   
## moon_phaseFull Moon       -0.02144    0.12325  -0.174   0.8619   
## moon_phaseWaning Crescent -0.03357    0.10252  -0.327   0.7433   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5580.3  on 4065  degrees of freedom
## Residual deviance: 5575.6  on 4061  degrees of freedom
## AIC: 5585.6
## 
## Number of Fisher Scoring iterations: 4

Commentary: This analysis categorizes games into moon phase bins based on moon_fraction and wax_wan status, then computes home win rates and average score differentials (result for home minus away) for each phase, alongside the overall home win rate across all games. The summary statistics and bar plot show that home win rates vary slightly by moon phase (e.g., around 56% for New Moon and overall), but with no clear pattern or substantial deviations from the overall rate of approximately 56%. The dashed red line in the bar plot highlights this consistency, indicating that any variations are likely due to random chance rather than a systematic effect. The boxplot of score differentials reinforces this, with similar distributions across phases centered around a positive home advantage of 2-3 points, without notable outliers or shifts. The logistic regression model predicting home win probability from moon phase yields non-significant coefficients (p > 0.05 for all phases relative to New Moon), confirming no statistically detectable influence of moon phases on home/away outcomes. This aligns with previous findings in the notebook, suggesting that moon phases do not introduce biases or affect NFL performance metrics, and any perceived relationships are coincidental.