Calculate Age-Standardised Rates with Confidence Intervals

This function calculates age-standardised rates using the direct standardisation method with confidence intervals. It can be used for both incidence rates (events over time) and prevalence rates (cross-sectional data). The gamma method is preferred for confidence intervals as it naturally prevents negative bounds and is the standard approach in epidemiological literature.

Usage

calculate_asr_direct(
  .df,
  conf_level = 0.95,
  multiplier = 1e+05,
  ci_method = "gamma",
  warn_small_cases = TRUE
)

Arguments

.df

A data frame containing age-specific case counts, population data, and standard population weights. Must contain the following columns:

age_group: Age group labels
events: Number of events/cases in each age group (integer). For incidence: new cases over time. For prevalence: individuals with condition at a point in time.
person_years: Person-years of follow-up (incidence) or population size (prevalence) in each age group (numeric). For prevalence studies, use the total number of individuals examined/surveyed.
standard_pop: Standard population weights for each age group (numeric)

conf_level

Confidence level for confidence intervals (default: 0.95)

multiplier

Multiplier for rate expression (default: 100000 for rates per 100,000). Use 100 for prevalence studies to express results as percentages.

ci_method

Character string specifying the confidence interval calculation method. Options are "gamma" (default) or "byars". The gamma method uses the gamma distribution approach (consistent with epitools), while the Byar's method uses Byar's approximation with Dobson adjustment (consistent with PHEindicatormethods). Both methods are statistically valid; differences are typically minimal.

warn_small_cases

Logical. If TRUE (default), warns when age groups have zero or < 5 cases which may lead to unstable rate estimates, and warns when age groups with zero person-years are automatically excluded. Set to FALSE to suppress these data quality warnings. Consider wider age groups for more stable results.

Value

A tibble containing:

crude_rate: Crude incidence rate
crude_rate_scaled: Crude rate multiplied by multiplier
crude_ci_lower: Lower confidence interval bound for crude rate
crude_ci_upper: Upper confidence interval bound for crude rate
crude_ci_lower_scaled: Lower crude CI multiplied by multiplier
crude_ci_upper_scaled: Upper crude CI multiplied by multiplier
asr: Age-standardised incidence rate
asr_scaled: ASR multiplied by multiplier
asr_ci_lower: Lower confidence interval bound for ASR
asr_ci_upper: Upper confidence interval bound for ASR
asr_ci_lower_scaled: Lower ASR CI multiplied by multiplier
asr_ci_upper_scaled: Upper ASR CI multiplied by multiplier
conf_level: Confidence level used
total_events: Total number of events across all age groups
total_person_years: Total person-years across all age groups
age_specific_data: Data frame with age-specific rates and details

Details

The function calculates both crude and age-standardised rates with confidence intervals using established methods:

Crude Rate Confidence Intervals: Uses Wilson score intervals via prop.test(), which is appropriate for proportions/rates and provides better coverage than normal approximation, especially for small counts.

Age-Standardised Rate Confidence Intervals: Uses either gamma distribution (default) or Byar's method, both standard approaches in epidemiological software. The gamma method naturally prevents negative confidence interval bounds and is appropriate for rate data.

The direct standardisation method calculates ASR as: ASR = Σ(w_i × r_i) where w_i are standardised weights and r_i are age-specific rates

Incidence vs Prevalence:

Incidence rates: Use person-years of follow-up in denominator, typically expressed per 100,000
Prevalence rates: Use total individuals examined in denominator, typically expressed as percentages (set multiplier = 100)

Age Grouping Considerations:

Age groups with zero person-years are automatically excluded from calculation
Zero cases are allowed but may indicate very low incidence or small populations
Age groups with < 5 cases produce less stable rate estimates and wider confidence intervals
Consider wider age groups if many strata have very few cases
The function balances statistical stability with age-specific precision
Use warn_small_cases = FALSE to suppress data quality warnings

References

Breslow, N. E., & Day, N. E. (1987). Statistical methods in cancer research. Volume II–The design and analysis of cohort studies. IARC scientific publications, (82), 1-406.

Examples

# Example 1: Incidence rates (per 100,000)
incidence_data <- data.frame(
  age_group = c("0-19", "20-39", "40-59", "60-79", "80+"),
  events = c(5L, 25L, 150L, 300L, 80L),
  person_years = c(20000, 25000, 22000, 15000, 3000),
  standard_pop = c(35000, 25000, 20000, 15000, 5000)
)

# Calculate age-standardised incidence rate
incidence_result <- calculate_asr_direct(.df = incidence_data)
print(incidence_result$asr_scaled)  # ASR per 100,000
#> [1] 603.447

# Example 2: Prevalence rates (as percentages)
prevalence_data <- data.frame(
  age_group = c("18-29", "30-49", "50-69", "70+"),
  events = c(12L, 45L, 98L, 67L),  # Individuals with condition
  person_years = c(1200, 1500, 1400, 800),  # Individuals surveyed
  standard_pop = c(2435, 3847, 2603, 763)
)

# Calculate age-standardised prevalence (use multiplier = 100 for percentages)
prevalence_result <- calculate_asr_direct(.df = prevalence_data, multiplier = 100)
print(paste("Prevalence:", round(prevalence_result$asr_scaled, 2), "%"))
#> [1] "Prevalence: 4 %"