Skip to content

Protein-Ligand Co-Design Benchmark¤

Level: Advanced | Runtime: ~3-5 minutes (CPU) / ~1-2 minutes (GPU) | Format: Python + Jupyter

Prerequisites: Understanding of protein-ligand interactions, drug discovery, and molecular modeling | Target Audience: Computational chemists and drug discovery researchers

Overview¤

This example demonstrates a comprehensive benchmark suite for evaluating protein-ligand co-design models. Learn how to use the CrossDocked2020 dataset, compute binding affinity predictions, assess molecular validity, evaluate drug-likeness, and systematically compare model architectures for computational drug discovery.

What You'll Learn¤

  • Molecular Modality


    Domain-specific framework for chemical structure representation

  • CrossDocked2020


    Large-scale protein-ligand binding dataset with 22.5M complexes

  • Binding Affinity


    Predict and evaluate protein-ligand binding energies (kcal/mol)

  • Molecular Validity


    Assess chemical plausibility of generated structures

  • Drug-likeness (QED)


    Quantify pharmaceutical potential using QED scores

  • Benchmark Suite


    Systematically evaluate and compare model architectures

Files¤

This example is available in two formats:

Quick Start¤

Run the Python Script¤

# Activate environment
source activate.sh

# Run the complete demo
python examples/generative_models/protein/protein_ligand_benchmark_demo.py

Run the Jupyter Notebook¤

# Activate environment
source activate.sh

# Launch Jupyter
jupyter lab examples/generative_models/protein/protein_ligand_benchmark_demo.ipynb

Key Concepts¤

1. Protein-Ligand Co-Design¤

Simultaneously optimizing both the protein binding site and ligand molecule for strong, specific binding:

Protein Pocket + Ligand → Protein-Ligand Complex
     ↓               ↓              ↓
  Flexibility    Chemistry    Binding Affinity
  Specificity    Drug-like    Stability

Applications:

  • De novo drug design
  • Lead optimization
  • Binding site engineering
  • Personalized medicine

2. CrossDocked2020 Dataset¤

Large-scale protein-ligand binding dataset:

from workshop.benchmarks.datasets.crossdocked import CrossDockedDataset

dataset = CrossDockedDataset(
    num_samples=50,
    max_protein_atoms=200,
    max_ligand_atoms=30,
    pocket_radius=8.0,
    rngs=rngs
)

# Get a sample
sample = dataset[0]
# sample = {
#     "protein_coords": (200, 3),      # Protein atom coordinates
#     "protein_types": (200,),         # Atom types (C, N, O, S, etc.)
#     "ligand_coords": (30, 3),        # Ligand atom coordinates
#     "ligand_types": (30,),           # Ligand atom types
#     "binding_affinity": -8.5,        # In kcal/mol (lower = stronger)
#     "pocket_indices": [12, 45, ...], # Binding pocket atom indices
# }

Dataset Statistics:

  • Total complexes: 22.5 million docked pairs
  • Protein size: ~50-500 atoms (binding pocket)
  • Ligand size: ~10-50 atoms (drug-like)
  • Binding affinity range: -15 to 0 kcal/mol

3. Molecular Modality Framework¤

Domain-specific functionality for chemical structures:

from workshop.generative_models.modalities.molecular import MolecularModality

modality = MolecularModality(rngs=rngs)

# Chemical constraints
config = ModalityConfiguration(
    name="molecular_config",
    modality_name="molecular",
    metadata={
        "use_chemical_constraints": True,
        "bond_length_weight": 1.0,        # Enforce realistic bond lengths
        "bond_angle_weight": 0.5,         # Enforce bond angles
        "use_pharmacophore_features": True,
        "pharmacophore_types": [
            "donor",       # H-bond donors
            "acceptor",    # H-bond acceptors
            "hydrophobic"  # Hydrophobic regions
        ],
    }
)

extensions = modality.get_extensions(config, rngs=rngs)
# extensions = {
#     "chemical_constraints": <ConstraintModule>,
#     "pharmacophore_features": <PharmacophoreModule>,
# }

4. Binding Affinity Metric¤

Evaluates binding affinity prediction accuracy:

\[\text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^{N} (\Delta G_{\text{pred}}^i - \Delta G_{\text{true}}^i)^2}\]
from workshop.benchmarks.metrics.protein_ligand import BindingAffinityMetric

metric = BindingAffinityMetric(rngs=rngs)

# True binding affinities (in kcal/mol)
true_affinities = jnp.array([-8.2, -6.5, -9.1, -7.8])

# Model predictions
predictions = jnp.array([-8.5, -6.2, -8.9, -7.5])

results = metric.compute(predictions, true_affinities)
# results = {
#     "rmse": 0.32,          # Root Mean Square Error (kcal/mol)
#     "pearson_r": 0.95,     # Correlation coefficient
#     "mae": 0.28,           # Mean Absolute Error
# }

Performance Targets:

  • Excellent: RMSE < 0.5 kcal/mol
  • Good: RMSE < 1.0 kcal/mol
  • Acceptable: RMSE < 1.5 kcal/mol

5. Molecular Validity Metric¤

Checks chemical plausibility of generated molecules:

from workshop.benchmarks.metrics.protein_ligand import MolecularValidityMetric

metric = MolecularValidityMetric(rngs=rngs)

results = metric.compute(
    coordinates=ligand_coords,  # (batch, num_atoms, 3)
    atom_types=atom_types,      # (batch, num_atoms)
    masks=atom_masks            # (batch, num_atoms)
)
# results = {
#     "validity_rate": 0.96,      # Overall validity (target: >0.95)
#     "bond_validity": 0.98,      # Valid bond lengths
#     "clash_free": 0.94,         # No atomic clashes
#     "connectivity": 0.97,       # Proper atom connectivity
# }

Validity Checks:

  • Bond lengths: 1.2-2.0 Å for most bonds
  • No clashes: Atoms >1.0 Å apart (except bonded)
  • Connectivity: All atoms form connected graph
  • Valence: Atoms respect valence rules

6. Drug-likeness Metric (QED)¤

Quantitative Estimate of Drug-likeness:

\[\text{QED} = \exp\left(\frac{1}{8}\sum_{i=1}^{8} \ln p_i\right)\]

where \(p_i\) are desirability functions for 8 molecular properties.

from workshop.benchmarks.metrics.protein_ligand import DrugLikenessMetric

metric = DrugLikenessMetric(rngs=rngs)

results = metric.compute(
    coordinates=ligand_coords,
    atom_types=atom_types,
    masks=atom_masks
)
# results = {
#     "qed_score": 0.75,             # Overall drug-likeness (target: >0.7)
#     "lipinski_compliance": 0.85,   # Lipinski's Rule of Five
#     "molecular_weight": 385.4,     # Daltons (target: 180-500)
#     "logp": 2.3,                   # Lipophilicity (target: 0-5)
#     "h_bond_donors": 2,            # Target: ≤5
#     "h_bond_acceptors": 4,         # Target: ≤10
# }

Lipinski's Rule of Five:

  • Molecular weight ≤ 500 Da
  • LogP ≤ 5
  • H-bond donors ≤ 5
  • H-bond acceptors ≤ 10

7. Benchmark Suite¤

Comprehensive evaluation across all metrics:

from workshop.benchmarks.suites.protein_ligand_suite import ProteinLigandBenchmarkSuite

suite = ProteinLigandBenchmarkSuite(
    dataset_config={
        "num_samples": 50,
        "max_protein_atoms": 200,
        "max_ligand_atoms": 30,
    },
    benchmark_config={
        "num_samples": 20,
        "batch_size": 4,
    },
    rngs=rngs
)

# Run evaluation
results = suite.run_all(model)
# results = {
#     "binding_affinity": {
#         "rmse": 0.45,
#         "pearson_r": 0.92,
#     },
#     "molecular_validity": {
#         "validity_rate": 0.97,
#         "bond_validity": 0.98,
#     },
#     "drug_likeness": {
#         "qed_score": 0.78,
#         "lipinski_compliance": 0.89,
#     },
# }

Code Structure¤

The example demonstrates six main components:

  1. Molecular Modality Framework - Chemical constraints and pharmacophore features
  2. CrossDocked2020 Dataset - Protein-ligand complex loading and statistics
  3. Binding Affinity Metric - RMSE evaluation for binding predictions
  4. Molecular Validity Metric - Chemical plausibility assessment
  5. Drug-likeness Metric - QED and Lipinski compliance
  6. Benchmark Suite - Comprehensive evaluation and model comparison

Features Demonstrated¤

  • ✅ Molecular modality with chemical constraints
  • ✅ CrossDocked2020 dataset with pocket extraction
  • ✅ Binding affinity prediction (RMSE, correlation)
  • ✅ Molecular validity checks (bonds, clashes, connectivity)
  • ✅ Drug-likeness evaluation (QED, Lipinski)
  • ✅ Complete benchmark suite execution
  • ✅ Model comparison across quality levels
  • ✅ Performance target assessment

Experiments to Try¤

  1. Adjust Model Quality
model = ExampleProteinLigandModel(rngs)
model.model_quality = "excellent"  # Try "poor", "good", or "excellent"
results = suite.run_all(model)
  1. Increase Dataset Size
dataset_config = {
    "num_samples": 100,    # More samples
    "max_protein_atoms": 300,
    "max_ligand_atoms": 40,
}
  1. Custom Pocket Radius
dataset = CrossDockedDataset(
    pocket_radius=10.0,  # Larger binding pocket
    # ...
)
  1. Add Custom Metrics
class CustomMetric(nnx.Module):
    def compute(self, predictions, targets):
        # Your custom evaluation logic
        return {"custom_score": score}

Next Steps¤

Troubleshooting¤

ImportError for Molecular Modality¤

Symptom: Cannot import molecular modality classes

Solution: Install molecular extras

uv sync --extra molecular

Dataset Loading Too Slow¤

Symptom: Long wait times for dataset initialization

Solution: Reduce number of samples

dataset_config = {
    "num_samples": 20,  # Smaller dataset for faster loading
}

CUDA Out of Memory¤

Symptom: GPU memory error during evaluation

Solution: Reduce batch size

benchmark_config = {
    "batch_size": 2,  # Smaller batches
}

Low Molecular Validity Rates¤

Symptom: Most generated molecules are invalid

Cause: Incorrect coordinate scaling or atom types

Solution: Check coordinate normalization

# Ensure coordinates are in angstroms
coordinates = coordinates * coordinate_scale

# Use realistic atom types (1-6 for C, N, O, S, P, F)
atom_types = jax.random.randint(key, (batch, num_atoms), 1, 7)

Additional Resources¤

Documentation¤

Papers and Resources¤

External Tools¤