Protein Extensions with Configuration System¤
Level: Intermediate | Runtime: ~10 seconds (CPU/GPU) | Format: Python + Jupyter
Overview¤
This example demonstrates how to use protein extensions with Workshop's Pydantic-based configuration system. Protein extensions add domain-specific capabilities (backbone constraints, amino acid embeddings) to geometric models through a modular, composable architecture. You'll learn how to load configurations from YAML files, create extensions programmatically, and integrate them with geometric models.
What You'll Learn¤
- Understand protein extensions and their modular architecture
- Load extension configurations from YAML files
- Create protein extensions programmatically with Pydantic models
- Integrate extensions with geometric models (PointCloudModel)
- Use configuration validation and serialization features
- Calculate extension-specific losses (bond length, bond angle)
Files¤
- Python Script:
examples/generative_models/protein/protein_extensions_with_config.py - Jupyter Notebook:
examples/generative_models/protein/protein_extensions_with_config.ipynb
Quick Start¤
Run the Python Script¤
# Activate environment
source activate.sh
# Run the example
python examples/generative_models/protein/protein_extensions_with_config.py
Run the Jupyter Notebook¤
# Activate environment
source activate.sh
# Launch Jupyter
jupyter lab examples/generative_models/protein/protein_extensions_with_config.ipynb
Key Concepts¤
Protein Extensions¤
Modular components that add protein-specific functionality to generic geometric models:
Backbone Constraints Extension:
- Enforces realistic bond lengths between atoms
- Enforces realistic bond angles
- Penalizes violations during training
Protein Mixin Extension:
- Embeds amino acid types (20 standard amino acids)
- Processes sequence information
- Integrates with geometric features
Extensibility:
- Easy to add new protein-specific features
- Composable: mix and match extensions
- Minimal coupling with base models
Configuration System¤
Workshop uses Pydantic models for type-safe, validated configurations:
Type Safety:
class ProteinExtensionConfig(BaseModel):
name: str # Required field
use_backbone_constraints: bool = True # With default
bond_length_weight: float = 1.0 # Validated type
YAML Integration:
Benefits:
- Automatic validation at load time
- Self-documenting through schemas
- Easy to serialize/deserialize
- Version control friendly
Code Structure¤
The example demonstrates nine major sections:
- Setup: Import libraries and initialize RNGs
- Load Configuration: From YAML or create programmatically
- Convert Config: Map to extension parameters
- Create Extensions: Using factory function
- Configure Model: Set up PointCloudModel
- Create Model: With extensions attached
- Prepare Data: Synthetic protein structures
- Forward Pass: Test model and extensions
- Calculate Losses: Reconstruction + extension losses
Example Code¤
Loading Configuration from YAML¤
from workshop.configs.schema.extensions import ProteinExtensionConfig
from workshop.configs.utils import create_config_from_yaml
# Load from YAML file
config_path = "configs/protein.yaml"
extension_config = create_config_from_yaml(config_path, ProteinExtensionConfig)
# Automatic validation
print(f"Loaded config: {extension_config.name}")
print(f"Backbone constraints: {extension_config.use_backbone_constraints}")
Creating Configuration Programmatically¤
# Fallback: create config in code
extension_config = ProteinExtensionConfig(
name="my_protein_extensions",
description="Custom protein extension config",
use_backbone_constraints=True,
use_protein_mixin=True,
)
# Pydantic validates all fields automatically
Creating Protein Extensions¤
from workshop.generative_models.extensions.protein import create_protein_extensions
# Convert config to extension parameters
protein_config = {
"use_backbone_constraints": True,
"bond_length_weight": 1.0,
"bond_angle_weight": 0.5,
"use_protein_mixin": True,
"aa_embedding_dim": 16,
"num_aa_types": 20,
}
# Create extensions
extensions = create_protein_extensions(protein_config, rngs=rngs)
print(f"Created extensions: {', '.join(extensions.keys())}")
# Output: Created extensions: backbone_constraints, protein_mixin
Integrating with Models¤
from workshop.generative_models.models.geometric import PointCloudModel
from workshop.generative_models.core.configuration import ModelConfiguration
# Create model config
model_config = ModelConfiguration(
name="protein_point_cloud",
model_class="workshop.generative_models.models.geometric.PointCloudModel",
input_dim=(num_points, 3),
output_dim=(num_points, 3),
parameters={
"num_points": 40,
"embed_dim": 64,
"num_layers": 2,
},
)
# Create model with extensions
model = PointCloudModel(model_config, extensions=extensions, rngs=rngs)
Extension Loss Calculation¤
# Forward pass
outputs = model(coords, deterministic=True)
# Calculate extension losses
total_loss = 0.0
for ext_name, extension in model.extensions.items():
if hasattr(extension, "loss_fn"):
ext_loss = extension.loss_fn(batch, outputs)
total_loss += ext_loss
print(f"{ext_name}: {ext_loss:.6f}")
# Output:
# backbone_constraints: 0.452000
# protein_mixin: 0.123000
Features Demonstrated¤
- Modular Extensions: Composable protein-specific capabilities
- Configuration System: Type-safe Pydantic models with validation
- YAML Support: Load/save configurations from files
- Integration: Extensions seamlessly integrate with geometric models
- Loss Composition: Extension losses combine with reconstruction loss
- Type Safety: Automatic validation prevents configuration errors
Experiments to Try¤
- Adjust Constraint Weights: Control strength of geometric constraints
- Disable Extensions: Compare with/without specific extensions
protein_config["use_backbone_constraints"] = False
# Observe effect on loss and generated structures
- Modify Embedding Dimension: Change amino acid representation capacity
- Save Configuration to YAML: Version control your settings
- Create Custom Extension: Add your own protein-specific functionality
Troubleshooting¤
Common Issues¤
Configuration validation error¤
Symptom:
Cause: Invalid value for a configuration field (wrong type, out of range, etc.)
Solution:
# Check field requirements
print(ProteinExtensionConfig.model_json_schema())
# Fix the config
extension_config = ProteinExtensionConfig(
name="valid_name", # Must be string
bond_length_weight=1.0, # Must be float
)
YAML file not found¤
Symptom:
Cause: Config file doesn't exist at specified path.
Solution:
# Use absolute path
import os
config_path = os.path.join(os.getcwd(), "configs/protein.yaml")
# Or create programmatically as fallback
try:
config = create_config_from_yaml(config_path, ProteinExtensionConfig)
except FileNotFoundError:
config = ProteinExtensionConfig(...) # Fallback
Extension has no loss_fn¤
Symptom:
Cause: Not all extensions implement loss functions.
Solution:
# Check before calling
if hasattr(extension, "loss_fn"):
loss = extension.loss_fn(batch, outputs)
else:
print(f"{ext_name} has no loss function")
Summary¤
In this example, you learned:
- ✅ How protein extensions add modular, domain-specific capabilities to models
- ✅ How to use Pydantic-based configurations for type safety and validation
- ✅ How to load configurations from YAML files for version control
- ✅ How extensions integrate with geometric models and contribute to losses
- ✅ How the configuration system provides serialization and documentation
Key Takeaways:
- Modularity: Extensions are composable and loosely coupled
- Type Safety: Pydantic validates configs automatically
- YAML Integration: Version control friendly configuration files
- Loss Composition: Extensions contribute domain-specific losses
Next Steps¤
-
Protein Point Cloud
Deep dive into protein point cloud modeling
-
Protein with Modality
Learn about the modality architecture
-
Configuration Guide
Complete guide to Workshop's config system
-
Custom Extensions
Create your own domain-specific extensions
Additional Resources¤
- Workshop Documentation: Configuration System
- Workshop Documentation: Protein Extensions
- Pydantic Documentation: Models and Validation
- API Reference: ProteinExtensionConfig
- API Reference: create_protein_extensions
Related Examples¤
- Protein Point Cloud Example - Detailed protein geometric modeling
- Protein Model with Modality - Modality architecture integration
- Geometric Benchmark Demo - Evaluating geometric models