Creates a matrix displaying variable importance on the diagonal and variable interaction on the off-diagonal.

vivi(
  data,
  fit,
  response,
  gridSize = 50,
  importanceType = "agnostic",
  nmax = 500,
  reorder = TRUE,
  class = 1,
  predictFun = NULL,
  normalized = FALSE,
  numPerm = 4,
  showVimpError = FALSE
)

Arguments

data

Data frame used for fit.

fit

A supervised machine learning model, which understands condvis2::CVpredict

response

The name of the response for the fit.

gridSize

The size of the grid for evaluating the predictions.

importanceType

Used to select the importance metric. By default, an agnostic importance measure is used. If an embedded metric is available, then setting this argument to the importance metric will use the selected importance values in the vivid-matrix. Please refer to the examples given for illustration. Alternatively, set to equal "agnostic" (the default) to override embedded importance measures and return agnostic importance values.

nmax

Maximum number of data rows to consider. Default is 500. Use all rows if NULL.

reorder

If TRUE (default) uses DendSer to reorder the matrix of interactions and variable importances.

class

Category for classification, a factor level, or a number indicating which factor level.

predictFun

Function of (fit, data) to extract numeric predictions from fit. Uses condvis2::CVpredict by default, which works for many fit classes.

normalized

Should Friedman's H-statistic be normalized or not. Default is FALSE.

numPerm

Number of permutations to perform for agnostic importance. Default is 4.

showVimpError

Logical. If TRUE, and numPerm > 1 then a tibble containing the variable names, their importance values, and the standard error for each importance is printed to the console.

Value

A matrix of interaction values, with importance on the diagonal.

Details

If the argument importanceType = 'agnostic', then an agnostic permutation importance (1) is calculated. Friedman's H statistic (2) is used for measuring the interactions. This measure is based on partial dependence curves and relates the interaction strength of a pair of variables to the total effect strength of that variable pair.

References

1: Fisher A., Rudin C., Dominici F. (2018). All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance. Arxiv.

2: Friedman, J. H. and Popescu, B. E. (2008). “Predictive learning via rule ensembles.” The Annals of Applied Statistics. JSTOR, 916–54.

Examples


aq <- na.omit(airquality)
f <- lm(Ozone ~ ., data = aq)
m <- vivi(fit = f, data = aq, response = "Ozone") # as expected all interactions are zero
#> Agnostic variable importance method used.
#> Calculating interactions...
viviHeatmap(m)


# Select importance metric
library(randomForest)
#> randomForest 4.7-1.1
#> Type rfNews() to see new features/changes/bug fixes.
#> 
#> Attaching package: ‘randomForest’
#> The following object is masked from ‘package:ranger’:
#> 
#>     importance
rf1 <- randomForest(Ozone~., data = aq, importance = TRUE)
m2 <- vivi(fit = rf1, data = aq, response = 'Ozone',
           importanceType = '%IncMSE') # select %IncMSE as the importance measure
#> %IncMSE importance selected.
#> Calculating interactions...
viviHeatmap(m2)


# \donttest{
library(ranger)
rf <- ranger(Species ~ ., data = iris, importance = "impurity", probability = TRUE)
vivi(fit = rf, data = iris, response = "Species") # returns agnostic importance
#> Agnostic variable importance method used.
#> Calculating interactions...
#>              Petal.Width Petal.Length Sepal.Length Sepal.Width
#> Petal.Width    0.3243165    7.7598903   7.18419678  5.70678283
#> Petal.Length   7.7598903    0.3098453   7.32373953  5.81933853
#> Sepal.Length   7.1841968    7.3237395   0.02232394  4.50969747
#> Sepal.Width    5.7067828    5.8193385   4.50969747  0.01147973
#> attr(,"class")
#> [1] "vivid"  "matrix" "array" 
vivi(fit = rf, data = iris, response = "Species",
     importanceType = "impurity") # returns selected 'impurity' importance.
#> Embedded impurity variable importance method used.
#> Calculating interactions...
#>              Petal.Width Petal.Length Sepal.Length Sepal.Width
#> Petal.Width    43.152277     7.983278     7.365552    5.845194
#> Petal.Length    7.983278    42.290158     7.471649    6.013182
#> Sepal.Length    7.365552     7.471649     8.967825    4.598689
#> Sepal.Width     5.845194     6.013182     4.598689    1.181052
#> attr(,"class")
#> [1] "vivid"  "matrix" "array" 
# }