Usage

Installation

To use SVEKER, first clone the repo from github via the button on the top right. Navigate to the folder and simply install it locally via pip.

$ pip install -e .

Example Usage

SVEKER is built on top of scikit-learn’s implementaion of SVR and SVC and can be used as a drop-in replacement. Below are some exampler for the regression and classification use.

Regression

First, we import the necessary packages and create some dummy data

import SVEKER
import numpy as np
from sklearn.model_selection import train_test_split

n_feat = 50
n_samples = 500

x = np.random.randint(low=0, high=2, size=(n_samples, n_feat))
y = np.random.rand(n_samples)

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=42)

Now we can buil our model and fit it

model = SVEKER.ExplainingSVR(kernel_type='tanimoto')
model.fit(x_train, y_train)

We can obtain the predictions as usual via predict

pred = model.predict(x_test)

To obtain the Shapley values we simply use shapley_values

shapley_values = model.shapley_values(x_test)

To verify that everything works fine we add the Shapley values pre prediciotn and add the expected value to it

np.allclose(shapley_values.sum(axis=-1) + model.expected_value, pred)

which returns True.

Classification

Same as above, we import the necessary packages and create some dummy data

import SVEKER
import numpy as np
from sklearn.model_selection import train_test_split

n_feat = 50
n_samples = 500

x = np.random.randint(low=0, high=2, size=(n_samples, n_feat))
y = np.random.randint(low=0, high=2, size=(n_samples,))

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=42)

Next, we fit the model

model = SVEKER.ExplainingSVC(kernel_type='tanimoto')
model.fit(x_train, y_train)

The Shapley values on default will be calculated in the decision function space

df = model.decision_function(x_test)
svs = model.shapley_values(x_test).sum(axis=-1) + model.expected_value
np.allclose(df, svs)

which returns True. Alternatively, you can also specifiy the logit output space

proba = model.predict_proba(x_test)[:,1]
logits = model.shapley_values(x_test, logit_space=True).sum(axis=-1) + model.expected_value_logit_space
np.allclose(logits, np.log(proba/(1.-proba)))

which returns True.

Citations

Our paper comparing different kernel functions:

  • Roth, J. P., & Bajorath, J. (2024). Machine learning models with distinct Shapley value explanations decouple feature attribution and interpretation for chemical compound predictions Cell Reports Physical Science, 102110, https://doi.org/10.1016/j.xcrp.2024.102110

For the original SVETA (Tanimoto kernel) implementation:

  • Feldmann, C., & Bajorath, J. (2022). Calculation of exact Shapley values for support vector machines with Tanimoto kernel enables model interpretation iScience, 25(9), https://doi.org/10.1016/j.isci.2022.105023

For the original SVERAD (rbf kernel) implementation:

  • Mastropietro, A., Feldmann, C., & Bajorath, J. (2023). Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel Scientific Reports, 13(1), 19561, https://doi.org/10.1038/s41598-023-46930-2