Download

Abstract

Machine learning has emerged as a powerful approach for molecular property prediction and drug discovery. However, the black-box nature of many machine learning models limits their interpretability, trustworthiness, and adoption in interdisciplinary research settings. This is especially the case in molecular machine learning, where explanations of model predictions are used for informed decision-making and downstream tasks. Shapley values, originating from cooperative game theory, provide a principled framework for attributing model predictions to individual input features. However, existing Shapley value-based explanations for molecular machine learning often rely on sampling-based approximations or operate at the level of abstract features, which can reduce attribution stability and limit chemical interpretability and actionability. Here, we introduce a fragment-level Shapley value framework that enables the exact computation of feature contributions at the level of chemically meaningful fragments for molecular property predictions without relying on sampling or feature imputation. By decomposing molecules into fragments, the proposed approach yields actionable explanations that can be directly related to established chemical concepts. We apply the method post hoc to random forest and graph convolutional network models using common molecular representations, including extended-connectivity fingerprints and molecular graphs. The approach is evaluated across three representative property prediction tasks: aqueous solubility, mutagenicity, and antiviral potency. Fragment-level Shapley values reproduce wellestablished chemical trends, identify known toxicophores, and enable guided molecular optimization. In addition, the method provides insights into model learning characteristics and helps delineate the applicability domain, particularly in settings with limited and structurally biased data. Overall, this work demonstrates that adapting Shapley values to chemically meaningful fragments enables interpretable explanations for molecular machine learning models, supporting molecular optimization and model validation.


Citation

Roth, Jannik P. “Chemically Interpretable Explanations for Molecular Property Prediction via Fragment-Level Shapley Values.” ChemRxiv (2026).

@article{roth2026chemically,
  title={Chemically Interpretable Explanations for Molecular Property Prediction via Fragment-Level Shapley Values},
  author={Roth, Jannik Philipp},
  journal={ChemRxiv},
  year={2026},
  doi={10.26434/chemrxiv.15002302/v12}
}