The goal of edar is to provide some convenient functions for common tasks in exploratory data analysis.
Citation
Sou T (2025). edar: Convenient Functions for Exploratory Data Analysis. R package version 0.0.3.9000, https://github.com/soutomas/edar.
citation("edar")
#> To cite package 'edar' in publications use:
#>
#> Sou T (2025). _edar: Convenient Functions for Exploratory Data
#> Analysis_. R package version 0.0.3.9000, <https://github.com/soutomas/edar>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {edar: Convenient Functions for Exploratory Data Analysis},
#> author = {Tomas Sou},
#> year = {2025},
#> note = {R package version 0.0.3.9000},
#> url = {https://github.com/soutomas/edar},
#> }
Installation
You can install the development version of edar from GitHub with:
# install.packages("pak")
pak::pak("soutomas/edar")
Example
Commonly, we want to generate a quick summary of variables in a dataset.
library(edar)
# Data
dat = mtcars |> dplyr::mutate(vs=factor(vs), am=factor(am))
# Summary for continuous variables in a data frame.
dat |> summ_by()
#> Dropped: vs am
#> Adding missing grouping variables: `name`
#> # A tibble: 9 × 10
#> name n nNA Mean Med SD Min P25 P75 Max
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 carb 32 0 2.81 2 1.62 1 2 4 8
#> 2 cyl 32 0 6.19 6 1.79 4 4 8 8
#> 3 disp 32 0 231. 196. 124. 71.1 121. 326 472
#> 4 drat 32 0 3.60 3.70 0.535 2.76 3.08 3.92 4.93
#> 5 gear 32 0 3.69 4 0.738 3 3 4 5
#> 6 hp 32 0 147. 123 68.6 52 96.5 180 335
#> 7 mpg 32 0 20.1 19.2 6.03 10.4 15.4 22.8 33.9
#> 8 qsec 32 0 17.8 17.7 1.79 14.5 16.9 18.9 22.9
#> 9 wt 32 0 3.22 3.32 0.978 1.51 2.58 3.61 5.42
# Summary of selected variable after grouping.
dat |> summ_by("mpg",vs)
#> Adding missing grouping variables: `vs`
#> # A tibble: 2 × 10
#> vs n nNA Mean Med SD Min P25 P75 Max
#> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 18 0 16.6 15.6 3.86 10.4 14.8 19.1 26
#> 2 1 14 0 24.6 22.8 5.38 17.8 21.4 29.6 33.9
dat |> summ_by("mpg",vs,am)
#> Adding missing grouping variables: `vs`, `am`
#> # A tibble: 4 × 11
#> # Groups: vs [2]
#> vs am n nNA Mean Med SD Min P25 P75 Max
#> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 12 0 15.0 15.2 2.77 10.4 14.0 16.6 19.2
#> 2 0 1 6 0 19.8 20.4 4.01 15 16.8 21 26
#> 3 1 0 7 0 20.7 21.4 2.47 17.8 18.6 22.2 24.4
#> 4 1 1 7 0 28.4 30.4 4.76 21.4 25.0 31.4 33.9
# Summary for categorical variables in a data frame.
dat |> summ_cat()
#> Dropped: mpg cyl disp hp drat wt qsec gear carb
#> $vs
#> vs n percent
#> 0 18 0.5625
#> 1 14 0.4375
#> Total 32 1.0000
#>
#> $am
#> am n percent
#> 0 19 0.59375
#> 1 13 0.40625
#> Total 32 1.00000
# Summary for selected categorical variable.
dat |> summ_cat("vs")
#> Dropped: mpg cyl disp hp drat wt qsec gear carb
#> vs n percent
#> 0 18 0.5625
#> 1 14 0.4375
#> Total 32 1.0000
Results can be directly viewed in a flextable object easily.
It is often helpful to add a label in the output indicating the source file.
# A label indicating the current source file can be easily generated.
lab = label_src(1)