Skip to contents

This function calculates age-standardised rates using the direct standardisation method with confidence intervals. It can be used for both incidence rates (events over time) and prevalence rates (cross-sectional data). The gamma method is preferred for confidence intervals as it naturally prevents negative bounds and is the standard approach in epidemiological literature.

Usage

calculate_asr_direct(
  .df,
  conf_level = 0.95,
  multiplier = 1e+05,
  ci_method = "gamma",
  warn_small_cases = TRUE
)

Arguments

.df

A data frame containing age-specific case counts, population data, and standard population weights. Must contain the following columns:

  • age_group: Age group labels

  • events: Number of events/cases in each age group (integer). For incidence: new cases over time. For prevalence: individuals with condition at a point in time.

  • person_years: Person-years of follow-up (incidence) or population size (prevalence) in each age group (numeric). For prevalence studies, use the total number of individuals examined/surveyed.

  • standard_pop: Standard population weights for each age group (numeric)

conf_level

Confidence level for confidence intervals (default: 0.95)

multiplier

Multiplier for rate expression (default: 100000 for rates per 100,000). Use 100 for prevalence studies to express results as percentages.

ci_method

Character string specifying the confidence interval calculation method. Options are "gamma" (default) or "byars". The gamma method uses the gamma distribution approach (consistent with epitools), while the Byar's method uses Byar's approximation with Dobson adjustment (consistent with PHEindicatormethods). Both methods are statistically valid; differences are typically minimal.

warn_small_cases

Logical. If TRUE (default), warns when age groups have zero or < 5 cases which may lead to unstable rate estimates, and warns when age groups with zero person-years are automatically excluded. Set to FALSE to suppress these data quality warnings. Consider wider age groups for more stable results.

Value

A tibble containing:

  • crude_rate: Crude incidence rate

  • crude_rate_scaled: Crude rate multiplied by multiplier

  • crude_ci_lower: Lower confidence interval bound for crude rate

  • crude_ci_upper: Upper confidence interval bound for crude rate

  • crude_ci_lower_scaled: Lower crude CI multiplied by multiplier

  • crude_ci_upper_scaled: Upper crude CI multiplied by multiplier

  • asr: Age-standardised incidence rate

  • asr_scaled: ASR multiplied by multiplier

  • asr_ci_lower: Lower confidence interval bound for ASR

  • asr_ci_upper: Upper confidence interval bound for ASR

  • asr_ci_lower_scaled: Lower ASR CI multiplied by multiplier

  • asr_ci_upper_scaled: Upper ASR CI multiplied by multiplier

  • conf_level: Confidence level used

  • total_events: Total number of events across all age groups

  • total_person_years: Total person-years across all age groups

  • age_specific_data: Data frame with age-specific rates and details

Details

The function calculates both crude and age-standardised rates with confidence intervals using established methods:

Crude Rate Confidence Intervals: Uses Wilson score intervals via prop.test(), which is appropriate for proportions/rates and provides better coverage than normal approximation, especially for small counts.

Age-Standardised Rate Confidence Intervals: Uses either gamma distribution (default) or Byar's method, both standard approaches in epidemiological software. The gamma method naturally prevents negative confidence interval bounds and is appropriate for rate data.

The direct standardisation method calculates ASR as: ASR = Σ(w_i × r_i) where w_i are standardised weights and r_i are age-specific rates

Incidence vs Prevalence:

  • Incidence rates: Use person-years of follow-up in denominator, typically expressed per 100,000

  • Prevalence rates: Use total individuals examined in denominator, typically expressed as percentages (set multiplier = 100)

Age Grouping Considerations:

  • Age groups with zero person-years are automatically excluded from calculation

  • Zero cases are allowed but may indicate very low incidence or small populations

  • Age groups with < 5 cases produce less stable rate estimates and wider confidence intervals

  • Consider wider age groups if many strata have very few cases

  • The function balances statistical stability with age-specific precision

  • Use warn_small_cases = FALSE to suppress data quality warnings

References

Breslow, N. E., & Day, N. E. (1987). Statistical methods in cancer research. Volume II–The design and analysis of cohort studies. IARC scientific publications, (82), 1-406.

Examples

# Example 1: Incidence rates (per 100,000)
incidence_data <- data.frame(
  age_group = c("0-19", "20-39", "40-59", "60-79", "80+"),
  events = c(5L, 25L, 150L, 300L, 80L),
  person_years = c(20000, 25000, 22000, 15000, 3000),
  standard_pop = c(35000, 25000, 20000, 15000, 5000)
)

# Calculate age-standardised incidence rate
incidence_result <- calculate_asr_direct(.df = incidence_data)
print(incidence_result$asr_scaled)  # ASR per 100,000
#> [1] 603.447

# Example 2: Prevalence rates (as percentages)
prevalence_data <- data.frame(
  age_group = c("18-29", "30-49", "50-69", "70+"),
  events = c(12L, 45L, 98L, 67L),  # Individuals with condition
  person_years = c(1200, 1500, 1400, 800),  # Individuals surveyed
  standard_pop = c(2435, 3847, 2603, 763)
)

# Calculate age-standardised prevalence (use multiplier = 100 for percentages)
prevalence_result <- calculate_asr_direct(.df = prevalence_data, multiplier = 100)
print(paste("Prevalence:", round(prevalence_result$asr_scaled, 2), "%"))
#> [1] "Prevalence: 4 %"