Convenient Functions for Exploratory Data Analysis • edar

The goal of edar is to provide some convenient functions for common tasks in exploratory data analysis.

Citation

Sou T (2025). edar: Convenient Functions for Exploratory Data Analysis. R package version 0.0.3.9000, https://github.com/soutomas/edar.

citation("edar")
#> To cite package 'edar' in publications use:
#> 
#>   Sou T (2025). _edar: Convenient Functions for Exploratory Data
#>   Analysis_. R package version 0.0.3.9000, <https://github.com/soutomas/edar>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {edar: Convenient Functions for Exploratory Data Analysis},
#>     author = {Tomas Sou},
#>     year = {2025},
#>     note = {R package version 0.0.3.9000},
#>     url = {https://github.com/soutomas/edar},
#>   }

Installation

You can install the development version of edar from GitHub with:

# install.packages("pak")
pak::pak("soutomas/edar")

Example

Commonly, we want to generate a quick summary of variables in a dataset.

library(edar)

# Data 
dat = mtcars |> dplyr::mutate(vs=factor(vs), am=factor(am))

# Summary for continuous variables in a data frame. 
dat |> summ_by()
#> Dropped: vs am
#> Adding missing grouping variables: `name`
#> # A tibble: 9 × 10
#>   name      n   nNA   Mean    Med      SD   Min    P25    P75    Max
#>   <chr> <int> <int>  <dbl>  <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1 carb     32     0   2.81   2      1.62   1      2      4      8   
#> 2 cyl      32     0   6.19   6      1.79   4      4      8      8   
#> 3 disp     32     0 231.   196.   124.    71.1  121.   326    472   
#> 4 drat     32     0   3.60   3.70   0.535  2.76   3.08   3.92   4.93
#> 5 gear     32     0   3.69   4      0.738  3      3      4      5   
#> 6 hp       32     0 147.   123     68.6   52     96.5  180    335   
#> 7 mpg      32     0  20.1   19.2    6.03  10.4   15.4   22.8   33.9 
#> 8 qsec     32     0  17.8   17.7    1.79  14.5   16.9   18.9   22.9 
#> 9 wt       32     0   3.22   3.32   0.978  1.51   2.58   3.61   5.42

# Summary of selected variable after grouping. 
dat |> summ_by("mpg",vs)
#> Adding missing grouping variables: `vs`
#> # A tibble: 2 × 10
#>   vs        n   nNA  Mean   Med    SD   Min   P25   P75   Max
#>   <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0        18     0  16.6  15.6  3.86  10.4  14.8  19.1  26  
#> 2 1        14     0  24.6  22.8  5.38  17.8  21.4  29.6  33.9
dat |> summ_by("mpg",vs,am)
#> Adding missing grouping variables: `vs`, `am`
#> # A tibble: 4 × 11
#> # Groups:   vs [2]
#>   vs    am        n   nNA  Mean   Med    SD   Min   P25   P75   Max
#>   <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0     0        12     0  15.0  15.2  2.77  10.4  14.0  16.6  19.2
#> 2 0     1         6     0  19.8  20.4  4.01  15    16.8  21    26  
#> 3 1     0         7     0  20.7  21.4  2.47  17.8  18.6  22.2  24.4
#> 4 1     1         7     0  28.4  30.4  4.76  21.4  25.0  31.4  33.9

# Summary for categorical variables in a data frame. 
dat |> summ_cat()
#> Dropped: mpg cyl disp hp drat wt qsec gear carb
#> $vs
#>     vs  n percent
#>      0 18  0.5625
#>      1 14  0.4375
#>  Total 32  1.0000
#> 
#> $am
#>     am  n percent
#>      0 19 0.59375
#>      1 13 0.40625
#>  Total 32 1.00000

# Summary for selected categorical variable. 
dat |> summ_cat("vs")
#> Dropped: mpg cyl disp hp drat wt qsec gear carb
#>     vs  n percent
#>      0 18  0.5625
#>      1 14  0.4375
#>  Total 32  1.0000

Results can be directly viewed in a flextable object easily.

# Show data frame in a flextable object. 
dat |> summ_by("mpg",vs) |> ft()

It is often helpful to add a label in the output indicating the source file.

# A label indicating the current source file can be easily generated. 
lab = label_src(1)

# A source label can be directly added to the flextable output. 
dat |> summ_cat("am") |> ft(src=1)

# A source label can be easily added to a ggplot object. 
library(ggplot2)
p = ggplot(mtcars, aes(mpg, wt)) + geom_point() 
p |> ggsrc()