Reproducibility
This directory provides the complete set of Jupyter notebooks used to generate all results presented in the DILImap publication. For convenience, you can also download a ZIP archive of all notebooks here.
Data Preparation
- 1.1. DataPrep: DILI labels
Assigns consensus DILI labels by integrating clinical annotations with compound metadata.
- 1.2. DataPrep: Cmax values
Aggregates Cmax values from 20+ studies to derive a consensus median Cmax per compound.
- 1.3. DataPrep: Cell viability
Fits dose–response curves to raw viability data to estimate IC₁₀ values.
Model Training
- 2.1. Training: Gene signatures
Generates compound-level gene signatures using DESeq2 after QC filtering.
- 2.2. Training: Pathway signatures
Computes pathway-level signatures via enrichment of DE results using WikiPathways.
- 2.3. Training: ToxPredictor model
Trains and tunes ensemble models; selects the final random forest classifier.
Model Validation
- 3.1. Validation: Gene signatures
Generates gene signatures for held-out compounds using DESeq2 after QC filtering.
- 3.2. Validation: Pathway signatures
Computes pathway-level signatures for validation compounds.
- 3.3. Validation: ToxPredictor model
Applies ToxPredictor to unseen validation compounds to compute risk scores and safety margins.
Results & Benchmarking
- 4.1. Results: Main Figures
Reproduces all main figures and tables from the manuscript that are not covered in validation or training notebook.
- 4.2. Benchmarking: In-silico Models
Benchmarks ToxPredictor against published in-silico DILI models.
- 4.3. Benchmarking: In-vitro Models
Benchmarks ToxPredictor against published in-vitro DILI models.