Comprehensive Geometric Benchmark Demo¤
Level: Advanced | Runtime: ~10-15 minutes (CPU) / ~3-5 minutes (GPU) | Format: Python + Jupyter
Prerequisites: Understanding of 3D geometry, point clouds, and transformer architectures | Target Audience: Users training 3D generative models
Overview¤
This example demonstrates a complete end-to-end pipeline for training and evaluating point cloud generation models with Workshop. Learn how to load ShapeNet datasets, train transformer-based geometric models, use Chamfer distance loss, and evaluate with comprehensive 3D metrics.
What You'll Learn¤
-
ShapeNet Dataset
PyTorch3D-style data loading with automatic fallbacks to synthetic data
-
Point Cloud Models
Transformer-based architecture for generating 3D point clouds
-
Chamfer Distance
Primary loss function for measuring point cloud similarity
-
Training Pipeline
Complete training with Adam optimizer, cosine scheduler, and checkpointing
-
Evaluation Metrics
Diversity, coverage, quality, and geometric fidelity scores
-
Benchmark Suite
Compare results against standard geometric benchmarks
Files¤
This example is available in two formats:
- Python Script:
geometric_benchmark_demo.py - Jupyter Notebook:
geometric_benchmark_demo.ipynb
Quick Start¤
Run the Python Script¤
# Activate environment
source activate.sh
# Run the complete demo (trains for 50 epochs)
python examples/generative_models/geometric/geometric_benchmark_demo.py
Run the Jupyter Notebook¤
# Activate environment
source activate.sh
# Launch Jupyter
jupyter lab examples/generative_models/geometric/geometric_benchmark_demo.ipynb
Key Concepts¤
1. Point Cloud Representation¤
Point clouds are sets of 3D coordinates representing object surfaces:
# Point cloud shape: (batch_size, num_points, 3)
point_cloud = jnp.array([
[[x1, y1, z1],
[x2, y2, z2],
...
[xN, yN, zN]]
]) # Shape: (1, 1024, 3)
Key Properties:
- Unordered: No canonical ordering of points
- Variable size: Different objects may have different numbers of points
- Surface representation: Points typically lie on object surface
- Normalized: Usually normalized to unit sphere or box
2. ShapeNet Dataset¤
Large-scale 3D object dataset with 51,300 models across 55 categories:
from workshop.benchmarks.datasets.geometric import ShapeNetDataset
dataset = ShapeNetDataset(
data_path="./data/shapenet",
config=data_config,
rngs=rngs
)
# Get batch
batch = dataset.get_batch(batch_size=8, split="train")
# batch = {
# "point_clouds": (8, 1024, 3), # 8 samples, 1024 points each
# "labels": (8,), # Category labels
# "synsets": ["02691156", ...], # Category IDs
# }
Synset Categories (examples):
02691156: Airplane02958343: Car03001627: Chair04379243: Table- More: See ShapeNet documentation
Automatic Fallbacks:
- Try downloading ShapeNet data
- Fall back to ModelNet if available
- Generate synthetic data if needed
3. Chamfer Distance Loss¤
Primary loss function for point clouds, measuring bidirectional nearest-neighbor distances:
from workshop.generative_models.core.losses.geometric import chamfer_distance
# Compute Chamfer distance
loss = chamfer_distance(pred_points, target_points)
# pred_points: (batch, num_points, 3)
# target_points: (batch, num_points, 3)
# loss: scalar value (lower is better)
Interpretation:
- First term: Average distance from predicted to closest real point
- Second term: Average distance from real to closest predicted point
- Symmetric: Penalizes both missing points and spurious points
4. Point Cloud Model Architecture¤
Transformer-based model for generating point clouds:
from workshop.generative_models.models.geometric.point_cloud import PointCloudModel
model_config = ModelConfiguration(
name="point_cloud_model",
model_class="workshop.generative_models.models.geometric.point_cloud.PointCloudModel",
input_dim=(1024, 3),
hidden_dims=[128],
dropout_rate=0.1,
metadata={
"geometric_params": {
"num_points": 1024,
"num_layers": 4,
"num_heads": 8,
}
},
)
model = PointCloudModel(config=model_config, rngs=rngs)
Architecture:
- Encoder: Point cloud → latent embedding (via self-attention)
- Transformer layers: Multi-head self-attention with residual connections
- Decoder: Latent embedding → reconstructed point cloud
- Permutation invariance: Order-independent processing via attention
5. Training Configuration¤
Complete training setup with optimizer and scheduler:
# Optimizer
optimizer_config = OptimizerConfiguration(
optimizer_type="adam",
learning_rate=1e-4,
weight_decay=1e-5,
beta1=0.9,
beta2=0.999,
)
# Learning rate schedule
scheduler_config = SchedulerConfiguration(
scheduler_type="cosine",
warmup_steps=100,
min_lr_ratio=0.01,
)
# Training
training_config = TrainingConfiguration(
batch_size=8,
num_epochs=50,
optimizer=optimizer_config,
scheduler=scheduler_config,
)
6. Evaluation Metrics¤
Comprehensive metrics for point cloud generation:
from workshop.benchmarks.metrics.geometric import PointCloudMetrics
metrics = PointCloudMetrics(rngs=rngs, config=eval_config)
results = metrics.compute(
real_data=real_point_clouds,
generated_data=generated_point_clouds
)
# results = {
# "1nn_accuracy": 0.85, # 1-NN classification accuracy
# "coverage": 0.72, # Coverage of real distribution
# "geometric_fidelity": 0.68, # Geometric quality score
# "chamfer_distance": 0.012, # Average Chamfer distance
# }
Metric Definitions:
- 1-NN Accuracy: Classification accuracy using 1-nearest neighbor
- Tests if generated samples are realistic
-
Higher is better (target: >0.8)
-
Coverage: Fraction of real samples covered by generated samples
- Tests distribution diversity
-
Higher is better (target: >0.6)
-
Geometric Fidelity: Quality of geometric structure
- Measures surface smoothness and completeness
-
Higher is better (target: >0.7)
-
Chamfer Distance: Average point-to-point distance
- Direct reconstruction quality
- Lower is better (target: <0.02)
7. Training Pipeline¤
Complete training loop with logging and checkpointing:
class GeometricDemoTrainer:
def train(self):
for epoch in range(num_epochs):
# Training phase
train_metrics = self._train_epoch(trainer, epoch)
# Validation phase
val_metrics = self._validate_epoch(trainer, epoch)
# Update learning rate
current_lr = self._update_learning_rate(trainer, epoch)
# Log metrics
self._log_epoch_metrics(epoch, train_metrics, val_metrics, current_lr)
# Save checkpoint
if (epoch + 1) % save_freq == 0:
self._save_checkpoint(trainer, epoch)
# Visualize progress
if (epoch + 1) % 25 == 0:
self._visualize_progress(trainer, epoch)
# Final evaluation
final_metrics = self._final_evaluation(trainer)
return trainer, final_metrics
Code Structure¤
The example consists of three main components:
- GeometricDemoTrainer - Complete trainer orchestrating:
- Dataset setup (ShapeNet with fallbacks)
- Model initialization (transformer architecture)
- Training loop (optimizer, scheduler, logging)
- Evaluation (comprehensive metrics)
-
Visualization (training curves, samples)
-
Training Pipeline - Real optimization:
- Forward pass through model
- Chamfer distance loss computation
- Gradient computation and parameter updates
-
Learning rate scheduling
-
Evaluation Suite - Comprehensive metrics:
- Diversity score (sample variation)
- Coverage score (distribution coverage)
- Quality score (geometric properties)
- Comparison with benchmarks
Features Demonstrated¤
- ✅ PyTorch3D-style ShapeNet dataset loading
- ✅ Automatic fallback to synthetic data
- ✅ Transformer-based point cloud model
- ✅ Chamfer distance loss function
- ✅ Adam optimizer with cosine decay schedule
- ✅ Complete training loop with real optimization
- ✅ Training/validation split with proper evaluation
- ✅ Checkpointing and model saving
- ✅ Training visualization (loss curves, samples)
- ✅ Comprehensive evaluation metrics
- ✅ Benchmark comparison
- ✅ Production-ready logging and reporting
Experiments to Try¤
- Use Real ShapeNet Data
demo_config = {
"dataset": {
"data_path": "./data/shapenet",
"data_source": "auto", # Try real data download
# ...
}
}
- Add More Categories
demo_config = {
"dataset": {
"synsets": [
"02691156", # Airplane
"02958343", # Car
"03001627", # Chair
],
# ...
}
}
- Increase Model Capacity
demo_config = {
"model": {
"embed_dim": 256, # More expressive
"num_layers": 8, # Deeper network
"num_heads": 16, # More attention
}
}
- Longer Training
demo_config = {
"training": {
"num_epochs": 200, # More training
"batch_size": 16, # Larger batches (if GPU allows)
}
}
- Different Optimizers
demo_config = {
"training": {
"optimizer": {
"optimizer_type": "adamw",
"weight_decay": 1e-4, # More regularization
}
}
}
Next Steps¤
-
Advanced Architectures
Try PointNet++, DGCNN, or diffusion models
-
Conditional Generation
Generate point clouds conditioned on category
-
Mesh Generation
Extend to surface reconstruction and meshing
-
Loss Functions
Explore geometric loss functions
Troubleshooting¤
Dataset Download Fails¤
Symptom: Error downloading ShapeNet data
Solution: The example automatically falls back to synthetic data
# Synthetic data is generated automatically
# To try real data:
demo_config["dataset"]["data_source"] = "auto"
Training Too Slow¤
Symptom: Training takes >20 minutes
Solution: Reduce epochs or batch size
demo_config["training"]["num_epochs"] = 25 # Faster
demo_config["training"]["batch_size"] = 4 # Less memory
CUDA Out of Memory¤
Symptom: CUDA out of memory error during training
Solution: Reduce batch size or model size
demo_config["training"]["batch_size"] = 4
demo_config["model"]["embed_dim"] = 64
demo_config["dataset"]["num_points"] = 512 # Fewer points
Poor Generation Quality¤
Symptom: Generated point clouds look random
Cause: Insufficient training or model capacity
Solution: Train longer or increase model size
demo_config["training"]["num_epochs"] = 100
demo_config["model"]["embed_dim"] = 256
demo_config["model"]["num_layers"] = 8
Loss Not Decreasing¤
Symptom: Training loss plateaus or increases
Cause: Learning rate too high or optimizer issue
Solution: Reduce learning rate or adjust optimizer
demo_config["training"]["optimizer"]["learning_rate"] = 5e-5 # Lower LR
demo_config["training"]["optimizer"]["weight_decay"] = 1e-6 # Less regularization
Additional Resources¤
Documentation¤
- Geometric Benchmark Suite - Complete benchmarking guide
- Point Cloud Models API - Model architecture details
- Chamfer Distance - Loss function documentation
- ShapeNet Dataset - Dataset documentation
Related Examples¤
- Loss Examples - Geometric loss functions
- Framework Features Demo - Configuration system
Papers and Resources¤
- PointNet: PointNet: Deep Learning on Point Sets (Qi et al., 2017)
- PointNet++: PointNet++: Deep Hierarchical Feature Learning (Qi et al., 2017)
- ShapeNet: ShapeNet: An Information-Rich 3D Model Repository (Chang et al., 2015)
- Point Cloud Transformers: PCT: Point Cloud Transformer (Guo et al., 2021)
- Chamfer Distance: Learning Representations and Generative Models for 3D Point Clouds