New plot methods for `check_outliers` (?)

This is regarding the `check_outliers()` paper for the journal _Mathematics_ (easystats/performance#544). I wonder if we should add new plot methods to include in the article submission (deadline is Feb 23). I explain in detail below.

---

# Model-based outliers

For model-based outliers, `see` has an awesome plotting method:

``` r
library(performance)
library(see)

data <- rbind(mtcars[1:4], 42, 55)
model <- lm(disp ~ mpg * hp, data = data)
x <- check_outliers(model, method = "cook")
plot(x)
```

![](https://i.imgur.com/0J4shPp.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

# Multiple methods

For multiple methods, we have no choice but to standardize the distance scores if we want to plot them on the same scale so I think the current solution is pretty satisfying.

``` r
library(performance)
library(see)

data <- rbind(mtcars[1:4], 42, 55)
model <- lm(disp ~ mpg * hp, data = data)
x <- check_outliers(data, method = c("zscore_robust", "iqr", "mcd", "lof"))
plot(x)
```

![](https://i.imgur.com/625zsme.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

# Multivariate methods

For a single multivariate method, I think it is ok-ish. Could be a lot of work to do a custom plotting method for each multivariate method so I think this is fine. But the x-axis is hard to read since the numbers overlap (so imagine with big data sets).

``` r
library(performance)
library(see)

data <- rbind(mtcars[1:4], 42, 55)
model <- lm(disp ~ mpg * hp, data = data)
x <- check_outliers(data, method = "mcd")
plot(x)
```

![](https://i.imgur.com/N777h9B.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

For the Mahalanobis method specifically, one colleague believes their own custom plot is more useful (and would be happy to see it implemented within `easystats`):

``` r
data <- rbind(mtcars[1:4], 42, 55)
data <- cbind(car = row.names(data), data)

mahaout <- function (dataset, vars, idvar) {
 maha <- as.data.frame(na.omit(dataset[, c(idvar, vars)]))
 maha$values <- mahalanobis(na.omit(dataset[, vars]),
 colMeans(na.omit(dataset[,vars]), na.rm=T),
 cov(na.omit(dataset[, vars]), use = "p"))
 crit <- qchisq(0.999, df = ncol(dataset[, vars]))
 plot(sort(maha$values),
 xlab = "Observations", ylab = "Mahalanobis values")
 abline(h = crit, col = "darkred")
 outliers <- maha[which(maha$values > crit), idvar]
 return(outliers)
}

mahaout(data, vars = names(data[-1]), idvar="car")
```

![](https://i.imgur.com/WuaqHDT.png)

 #> [1] "34"

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

Should we have something like that for `method = mahalanobis` and similar ones? The guiding principle could be: plotting distance of individual observations + line at chosen threshold. If we do this it might not be that much work since the actual distances and thresholds are already accessible as attributes, so it would make for very consistent plotting.

## Lakens's Method

Edit: forgot to add this other example: Alternatively, we have the plot outlier method from the Daniel Lakens's outliers paper ([Leys et al. (2019)](https://rips-irsp.com/articles/10.5334/irsp.289)).

``` r
library(Routliers)

data <- rbind(mtcars[1:4], 42, 55)
res <- outliers_mcd(x = data)
plot_outliers_mcd(res, x = data)
```

![](https://i.imgur.com/9IjtNgT.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)


# Univariate methods

Let me give you another example of mine for univariate outliers. Currently, we have the same boring plot for `method = zscore_robust` for instance.

``` r
library(performance)
library(see)

data <- rbind(mtcars[1:4], 42, 55)
x <- check_outliers(data, method = "zscore_robust")
plot(x)
```

![](https://i.imgur.com/KI5WNC6.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

But I was imagining that perhaps it would be useful to use something like this for zscores:

``` r
library(rempsyc)

data <- rbind(mtcars[1:4], 42, 55)
plot_outliers(data, response = "mpg", method = "sd", criteria = 3)
```

![](https://i.imgur.com/Rjd0srL.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

And something similar for robust zscores, but for several variables we could also wrap it in a panel:

``` r
library(rempsyc)
library(see)
data <- rbind(mtcars[1:4], 42, 55)
plots(lapply(names(data), function(x) {
 plot_outliers(data, response = x, ytitle = x, method = "mad", criteria = 3)
}), n_columns = 2)
```

![](https://i.imgur.com/EaWWXST.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

## Lakens's Method

Edit: forgot to add this other example: Alternatively, we have the plot outlier method from the Daniel Lakens's outliers paper ([Leys et al. (2019)](https://rips-irsp.com/articles/10.5334/irsp.289)).

``` r
library(Routliers)

data <- rbind(mtcars[1:4], 42, 55)
res <- outliers_mad(x = data$mpg)
plot_outliers_mad(res, x = data$mpg) 
```

![](https://i.imgur.com/WavyMek.png)

Created on 2023-01-20 with [reprex v2.0.2](https://reprex.tidyverse.org)

# Challenges

One possible challenge for univariate method is when applied to several columns. In that case the proposed solution will not work since the rescaled score (0-1) is an aggregate of the score of each column (for single multivariate methods that would not be a problem by definition). So we could implement this when a single method + single column are selected? Unless of course we use `lapply` with `see:plots` like in the last example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New plot methods for `check_outliers` (?) #262

Model-based outliers

Multiple methods

Multivariate methods

Lakens's Method

Univariate methods

Lakens's Method

Challenges

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

New plot methods for check_outliers (?) #262

Description

Model-based outliers

Multiple methods

Multivariate methods

Lakens's Method

Univariate methods

Lakens's Method

Challenges

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

New plot methods for `check_outliers` (?) #262