Reproducibility

This directory provides the complete set of Jupyter notebooks used to generate all results presented in the DILImap publication. For convenience, you can also download a ZIP archive of all notebooks here.

Data Preparation

1.1. DataPrep: DILI labels

Assigns consensus DILI labels by integrating clinical annotations with compound metadata.

1.2. DataPrep: Cmax values

Aggregates Cmax values from 20+ studies to derive a consensus median Cmax per compound.

1.3. DataPrep: Cell viability

Fits dose–response curves to raw viability data to estimate IC₁₀ values.

Model Training

2.1. Training: Gene signatures

Generates compound-level gene signatures using DESeq2 after QC filtering.

2.2. Training: Pathway signatures

Computes pathway-level signatures via enrichment of DE results using WikiPathways.

2.3. Training: ToxPredictor model

Trains and tunes ensemble models; selects the final random forest classifier.

Model Validation

3.1. Validation: Gene signatures

Generates gene signatures for held-out compounds using DESeq2 after QC filtering.

3.2. Validation: Pathway signatures

Computes pathway-level signatures for validation compounds.

3.3. Validation: ToxPredictor model

Applies ToxPredictor to unseen validation compounds to compute risk scores and safety margins.

Results & Benchmarking

4.1. Results: Main Figures

Reproduces all main figures and tables from the manuscript that are not covered in validation or training notebook.

4.2. Benchmarking: In-silico Models

Benchmarks ToxPredictor against published in-silico DILI models.

4.3. Benchmarking: In-vitro Models

Benchmarks ToxPredictor against published in-vitro DILI models.