香港赛马会彩券管理局

shapper is on CRAN, it’s an R wrapper over SHAP explainer for black-box models

March 5, 2019
By

(This article was first published on English – SmarterPoland.pl, and kindly contributed to R-bloggers)

Written by: Alicja Gosiewska

In applied machine learning, there are opinions that we need to choose between interpretability and accuracy. However in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP).

The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique borrowed from the game theory. SHAP was introduced by Scott M. Lundberg and Su-In Lee in A Unified Approach to Interpreting Model Predictions NIPS paper. Originally it was implemented in the Python library shap.

The R package shapper is a port of the Python library shap. In this post we show the functionalities of shapper. The examples are provided on titanic_train data set for classification.

While shapper is a port for Python library shap, there are also pure R implementations of the SHAP method, e.g. iml or shapleyR.

Installation

The shapper wraps up the Python library, therefore installation requires a bit more effort than installation of an ordinary R package.

Install the R package shapper

First of all we need to install shapper, this may be the stable release from CRAN

install.packages("shapper")

or the developer version form GitHub.

devtools::install_github("ModelOriented/shapper")

Install the Python library shap

Before you run shapper, make sure that you have installed Python.

Python library shap is required to use shapper. The shap can be installed both by Python or R. To install it through R, you an use function install_shap() from the shapper package.

library("shapper")
install_shap()

If you experience any problems related to the installation of Python libraries or evaluation of Python code, see the reticulate documentation. The shapper access Python within reticulate, therefore the solution to the problem is likely to be in there ;-).

Would you survive sinking of the RMS Titanic?

The example usage is presented on the titanic_train dataset from the R package titanic. We will predict the Survived status. The other variables used by the model are: Pclass, Sex, Age, SibSp, Parch, Fare and Embarked.

library("titanic")
titanic <- titanic_train[,c("Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked")]
titanic$Survived <- factor(titanic$Survived)
titanic$Sex <- factor(titanic$Sex)
titanic$Embarked <- factor(titanic$Embarked)
titanic <- na.omit(titanic)
head(titanic)
##   Survived Pclass    Sex Age SibSp Parch    Fare Embarked
## 1        0      3   male  22     1     0  7.2500        S
## 2        1      1 female  38     1     0 71.2833        C
## 3        1      3 female  26     0     0  7.9250        S
## 4        1      1 female  35     1     0 53.1000        S
## 5        0      3   male  35     0     0  8.0500        S
## 7        0      1   male  54     0     0 51.8625        S

Let’s build a model

Let’s see what are our chances assessed by the random forest model.

library("randomForest")
set.seed(123)
model_rf <- randomForest(Survived ~ . , data = titanic)
model_rf
## 
## Call:
##  randomForest(formula = Survived ~ ., data = titanic) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 18.63%
## Confusion matrix:
##     0   1 class.error
## 0 384  40  0.09433962
## 1  93 197  0.32068966

Prediction to be explained

Let’s assume that we want to explain the prediction of a particular observation (male, 8 years old, traveling 1-st class embarked at C, without parents and siblings.

new_passanger <- data.frame(
            Pclass = 1,
            Sex = factor("male", levels = c("female", "male")),
            Age = 8,
            SibSp = 0,
            Parch = 0,
            Fare = 72,
            Embarked = factor("C", levels = c("","C","Q","S"))
)

Model prediction for this observation is .558 for survival.

predict(model_rf, new_passanger, type = "prob")
##       0     1
## 1 0.442 0.558
## attr(,"class")
## [1] "matrix" "votes"

Here shapper starts

To use the function shap() function (alias for individual_variable_effect()) we need four elements

  • a model,
  • a data set,
  • a function that calculated scores (predict function),
  • an instance (or instances) to be explained.

The shap() function can be used directly with these four arguments, but for the simplicity here we are using the DALEX package with preimplemented predict functions.

library("DALEX")
exp_rf <- explain(model_rf, data = titanic[,-1])

The explainer is an object that wraps up a model and meta-data. Meta data consists of, at least, the data set used to fit model and observations to explain.

And now it’s enough to generate SHAP attributions with explainer for RF model.

library("shapper")
ive_rf <- shap(exp_rf, new_observation = new_passanger)
ive_rf
##     Pclass  Sex Age SibSp Parch Fare Embarked _id_ _ylevel_ _yhat_
## 1        1 male   8     0     0   72        C    1           0.558
## 1.1      1 male   8     0     0   72        C    1           0.558
## 1.2      1 male   8     0     0   72        C    1           0.558
## 1.3      1 male   8     0     0   72        C    1           0.558
## 1.4      1 male   8     0     0   72        C    1           0.558
## 1.5      1 male   8     0     0   72        C    1           0.558
##     _yhat_mean_ _vname_ _attribution_ _sign_      _label_
## 1     0.3672941  Pclass   0.070047752      + randomForest
## 1.1   0.3672941     Sex  -0.154519708      - randomForest
## 1.2   0.3672941     Age   0.143046212      + randomForest
## 1.3   0.3672941   SibSp   0.003154522      + randomForest
## 1.4   0.3672941   Parch  -0.018111585      - randomForest
## 1.5   0.3672941    Fare   0.086728705      + randomForest

Plotting results

To generate a plot of Shapley values you can simply pass an object of class importance_variable_effect to a plot() function. Since we are interested in the class Survived = 1 we may add additional parameter that filter only selected classes.

plot(ive_rf)

Labels on y-axis show values of variables for this particular observation. Black arrows show predictions of model, in this case, probabilities of each status. Other arrows show effect of each variable on this prediction. Effects may be positive or negative and they sum up to the value of prediction.

On this plot we can see that model predicts that the passenger will survive. Changes are higher due to young age and 1st class, only the Sex = male decreases chances of survival for this observation.

More models

It is useful to contrast prediction of two models. Here we will show how to use shapper for such contrastive explanations.

We will compare randomForest with svm implemented in the e1071.

library("e1071")
model_svm <- svm(Survived~. , data = titanic, probability = TRUE)
model_svm
## 
## Call:
## svm(formula = Survived ~ ., data = titanic, probability = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.1 
## 
## Number of Support Vectors:  338

This model predict 32.5% chances of survival.

attr(predict(model_svm, newdata = new_passanger, probability = TRUE), "probabilities")
##           0         1
## 1 0.6748768 0.3251232
Again, we create an explainer that wraps model, data and the predict function.

exp_svm <- explain(model_svm, data = titanic[,-1])
ive_svm <- shap(exp_svm, new_passanger)

Shapley values plot may be modified. To show more than one model you can pass more individual_variable_plot objects.

plot(ive_rf, ive_svm)

To see only attributions use option show_predcited = FALSE.

plot(ive_rf, show_predcited = FALSE)

More

Documentation and more examples are available at https://modeloriented.github.io/shapper/. The stable version of the package is on CRAN, the development version is on GitHub (https://github.com/ModelOriented/shapper). Shapper is a part of the DALEX universe.

To leave a comment for the author, please follow the link and comment on their blog: English – SmarterPoland.pl.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Recent popular posts

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)
香港赛马会彩券管理局
香港赛马会摩星岭青年旅社 体彩安徽11选5开奖结果 上海基诺彩票中奖金额 3d村胆码报 预测 平特一肖中特期准 湖北体育彩票7位数走势图百度 北京快3人工计划 11选5智能选号 台湾五分彩都是骗局吗 排列5最近30期走势图 甘肃十一选五开奖视频 辽宁快乐十二软件 新浪彩票客户端充值方式 苏格兰斯诺克公开赛 中国福彩双色球开奖