A tidyverse-friendly summary function that summarises a dataframe by column type.
my_skim(data, ...)
A tibble, or an object that can be coerced into a tibble.
Columns to select for skimming. When none are provided, the default is to skim all columns.
Works with dplyr::group_by()
and the pipe. See the
skim
documentation for more details. Adapts the
skimr::skim()
function to include proportion counts for factor
variables
In general, more informative results are returned if character-type columns are first converted to factors (see examples below)
# summarise the iris dataset
my_skim(iris)
#> ── Data Summary ────────────────────────
#> Values
#> Name iris
#> Number of rows 150
#> Number of columns 5
#> _______________________
#> Column type frequency:
#> factor 1
#> numeric 4
#> ________________________
#> Group variables None
#>
#> ── Variable type: factor ───────────────────────────────────────────────────────
#>
#>
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#>
# summarise the mtcars dataset by transmissions type ("am": 0 = automatic, 1 = manual)
library(magrittr)
mtcars %>%
dplyr::mutate(
dplyr::across(
tidyselect::all_of(c("cyl", "vs", "am", "gear", "carb")),
as.factor)
) %>%
dplyr::group_by(am) %>%
my_skim()
#> ── Data Summary ────────────────────────
#> Values
#> Name Piped data
#> Number of rows 32
#> Number of columns 11
#> _______________________
#> Column type frequency:
#> factor 4
#> numeric 6
#> ________________________
#> Group variables am
#>
#> ── Variable type: factor ───────────────────────────────────────────────────────
#>
#>
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#>