Skip to content

Comprehensive Geometric Benchmark Demo¤

Level: Advanced | Runtime: ~10-15 minutes (CPU) / ~3-5 minutes (GPU) | Format: Python + Jupyter

Prerequisites: Understanding of 3D geometry, point clouds, and transformer architectures | Target Audience: Users training 3D generative models

Overview¤

This example demonstrates a complete end-to-end pipeline for training and evaluating point cloud generation models with Workshop. Learn how to load ShapeNet datasets, train transformer-based geometric models, use Chamfer distance loss, and evaluate with comprehensive 3D metrics.

What You'll Learn¤

  • ShapeNet Dataset


    PyTorch3D-style data loading with automatic fallbacks to synthetic data

  • Point Cloud Models


    Transformer-based architecture for generating 3D point clouds

  • Chamfer Distance


    Primary loss function for measuring point cloud similarity

  • Training Pipeline


    Complete training with Adam optimizer, cosine scheduler, and checkpointing

  • Evaluation Metrics


    Diversity, coverage, quality, and geometric fidelity scores

  • Benchmark Suite


    Compare results against standard geometric benchmarks

Files¤

This example is available in two formats:

Quick Start¤

Run the Python Script¤

# Activate environment
source activate.sh

# Run the complete demo (trains for 50 epochs)
python examples/generative_models/geometric/geometric_benchmark_demo.py

Run the Jupyter Notebook¤

# Activate environment
source activate.sh

# Launch Jupyter
jupyter lab examples/generative_models/geometric/geometric_benchmark_demo.ipynb

Key Concepts¤

1. Point Cloud Representation¤

Point clouds are sets of 3D coordinates representing object surfaces:

# Point cloud shape: (batch_size, num_points, 3)
point_cloud = jnp.array([
    [[x1, y1, z1],
     [x2, y2, z2],
     ...
     [xN, yN, zN]]
])  # Shape: (1, 1024, 3)

Key Properties:

  • Unordered: No canonical ordering of points
  • Variable size: Different objects may have different numbers of points
  • Surface representation: Points typically lie on object surface
  • Normalized: Usually normalized to unit sphere or box

2. ShapeNet Dataset¤

Large-scale 3D object dataset with 51,300 models across 55 categories:

from workshop.benchmarks.datasets.geometric import ShapeNetDataset

dataset = ShapeNetDataset(
    data_path="./data/shapenet",
    config=data_config,
    rngs=rngs
)

# Get batch
batch = dataset.get_batch(batch_size=8, split="train")
# batch = {
#     "point_clouds": (8, 1024, 3),  # 8 samples, 1024 points each
#     "labels": (8,),                 # Category labels
#     "synsets": ["02691156", ...],   # Category IDs
# }

Synset Categories (examples):

Automatic Fallbacks:

  1. Try downloading ShapeNet data
  2. Fall back to ModelNet if available
  3. Generate synthetic data if needed

3. Chamfer Distance Loss¤

Primary loss function for point clouds, measuring bidirectional nearest-neighbor distances:

\[L_{\text{Chamfer}}(X, Y) = \frac{1}{|X|}\sum_{x \in X} \min_{y \in Y} \|x - y\|^2 + \frac{1}{|Y|}\sum_{y \in Y} \min_{x \in X} \|x - y\|^2\]
from workshop.generative_models.core.losses.geometric import chamfer_distance

# Compute Chamfer distance
loss = chamfer_distance(pred_points, target_points)

# pred_points: (batch, num_points, 3)
# target_points: (batch, num_points, 3)
# loss: scalar value (lower is better)

Interpretation:

  • First term: Average distance from predicted to closest real point
  • Second term: Average distance from real to closest predicted point
  • Symmetric: Penalizes both missing points and spurious points

4. Point Cloud Model Architecture¤

Transformer-based model for generating point clouds:

from workshop.generative_models.models.geometric.point_cloud import PointCloudModel

model_config = ModelConfiguration(
    name="point_cloud_model",
    model_class="workshop.generative_models.models.geometric.point_cloud.PointCloudModel",
    input_dim=(1024, 3),
    hidden_dims=[128],
    dropout_rate=0.1,
    metadata={
        "geometric_params": {
            "num_points": 1024,
            "num_layers": 4,
            "num_heads": 8,
        }
    },
)

model = PointCloudModel(config=model_config, rngs=rngs)

Architecture:

  • Encoder: Point cloud → latent embedding (via self-attention)
  • Transformer layers: Multi-head self-attention with residual connections
  • Decoder: Latent embedding → reconstructed point cloud
  • Permutation invariance: Order-independent processing via attention

5. Training Configuration¤

Complete training setup with optimizer and scheduler:

# Optimizer
optimizer_config = OptimizerConfiguration(
    optimizer_type="adam",
    learning_rate=1e-4,
    weight_decay=1e-5,
    beta1=0.9,
    beta2=0.999,
)

# Learning rate schedule
scheduler_config = SchedulerConfiguration(
    scheduler_type="cosine",
    warmup_steps=100,
    min_lr_ratio=0.01,
)

# Training
training_config = TrainingConfiguration(
    batch_size=8,
    num_epochs=50,
    optimizer=optimizer_config,
    scheduler=scheduler_config,
)

6. Evaluation Metrics¤

Comprehensive metrics for point cloud generation:

from workshop.benchmarks.metrics.geometric import PointCloudMetrics

metrics = PointCloudMetrics(rngs=rngs, config=eval_config)

results = metrics.compute(
    real_data=real_point_clouds,
    generated_data=generated_point_clouds
)

# results = {
#     "1nn_accuracy": 0.85,          # 1-NN classification accuracy
#     "coverage": 0.72,              # Coverage of real distribution
#     "geometric_fidelity": 0.68,    # Geometric quality score
#     "chamfer_distance": 0.012,     # Average Chamfer distance
# }

Metric Definitions:

  • 1-NN Accuracy: Classification accuracy using 1-nearest neighbor
  • Tests if generated samples are realistic
  • Higher is better (target: >0.8)

  • Coverage: Fraction of real samples covered by generated samples

  • Tests distribution diversity
  • Higher is better (target: >0.6)

  • Geometric Fidelity: Quality of geometric structure

  • Measures surface smoothness and completeness
  • Higher is better (target: >0.7)

  • Chamfer Distance: Average point-to-point distance

  • Direct reconstruction quality
  • Lower is better (target: <0.02)

7. Training Pipeline¤

Complete training loop with logging and checkpointing:

class GeometricDemoTrainer:
    def train(self):
        for epoch in range(num_epochs):
            # Training phase
            train_metrics = self._train_epoch(trainer, epoch)

            # Validation phase
            val_metrics = self._validate_epoch(trainer, epoch)

            # Update learning rate
            current_lr = self._update_learning_rate(trainer, epoch)

            # Log metrics
            self._log_epoch_metrics(epoch, train_metrics, val_metrics, current_lr)

            # Save checkpoint
            if (epoch + 1) % save_freq == 0:
                self._save_checkpoint(trainer, epoch)

            # Visualize progress
            if (epoch + 1) % 25 == 0:
                self._visualize_progress(trainer, epoch)

        # Final evaluation
        final_metrics = self._final_evaluation(trainer)

        return trainer, final_metrics

Code Structure¤

The example consists of three main components:

  1. GeometricDemoTrainer - Complete trainer orchestrating:
  2. Dataset setup (ShapeNet with fallbacks)
  3. Model initialization (transformer architecture)
  4. Training loop (optimizer, scheduler, logging)
  5. Evaluation (comprehensive metrics)
  6. Visualization (training curves, samples)

  7. Training Pipeline - Real optimization:

  8. Forward pass through model
  9. Chamfer distance loss computation
  10. Gradient computation and parameter updates
  11. Learning rate scheduling

  12. Evaluation Suite - Comprehensive metrics:

  13. Diversity score (sample variation)
  14. Coverage score (distribution coverage)
  15. Quality score (geometric properties)
  16. Comparison with benchmarks

Features Demonstrated¤

  • ✅ PyTorch3D-style ShapeNet dataset loading
  • ✅ Automatic fallback to synthetic data
  • ✅ Transformer-based point cloud model
  • ✅ Chamfer distance loss function
  • ✅ Adam optimizer with cosine decay schedule
  • ✅ Complete training loop with real optimization
  • ✅ Training/validation split with proper evaluation
  • ✅ Checkpointing and model saving
  • ✅ Training visualization (loss curves, samples)
  • ✅ Comprehensive evaluation metrics
  • ✅ Benchmark comparison
  • ✅ Production-ready logging and reporting

Experiments to Try¤

  1. Use Real ShapeNet Data
demo_config = {
    "dataset": {
        "data_path": "./data/shapenet",
        "data_source": "auto",  # Try real data download
        # ...
    }
}
  1. Add More Categories
demo_config = {
    "dataset": {
        "synsets": [
            "02691156",  # Airplane
            "02958343",  # Car
            "03001627",  # Chair
        ],
        # ...
    }
}
  1. Increase Model Capacity
demo_config = {
    "model": {
        "embed_dim": 256,     # More expressive
        "num_layers": 8,      # Deeper network
        "num_heads": 16,      # More attention
    }
}
  1. Longer Training
demo_config = {
    "training": {
        "num_epochs": 200,    # More training
        "batch_size": 16,     # Larger batches (if GPU allows)
    }
}
  1. Different Optimizers
demo_config = {
    "training": {
        "optimizer": {
            "optimizer_type": "adamw",
            "weight_decay": 1e-4,  # More regularization
        }
    }
}

Next Steps¤

  • Advanced Architectures


    Try PointNet++, DGCNN, or diffusion models

    Advanced 3D Models

  • Conditional Generation


    Generate point clouds conditioned on category

    Conditional 3D

  • Mesh Generation


    Extend to surface reconstruction and meshing

    Mesh Models

  • Loss Functions


    Explore geometric loss functions

    Loss Examples

Troubleshooting¤

Dataset Download Fails¤

Symptom: Error downloading ShapeNet data

Solution: The example automatically falls back to synthetic data

# Synthetic data is generated automatically
# To try real data:
demo_config["dataset"]["data_source"] = "auto"

Training Too Slow¤

Symptom: Training takes >20 minutes

Solution: Reduce epochs or batch size

demo_config["training"]["num_epochs"] = 25  # Faster
demo_config["training"]["batch_size"] = 4   # Less memory

CUDA Out of Memory¤

Symptom: CUDA out of memory error during training

Solution: Reduce batch size or model size

demo_config["training"]["batch_size"] = 4
demo_config["model"]["embed_dim"] = 64
demo_config["dataset"]["num_points"] = 512  # Fewer points

Poor Generation Quality¤

Symptom: Generated point clouds look random

Cause: Insufficient training or model capacity

Solution: Train longer or increase model size

demo_config["training"]["num_epochs"] = 100
demo_config["model"]["embed_dim"] = 256
demo_config["model"]["num_layers"] = 8

Loss Not Decreasing¤

Symptom: Training loss plateaus or increases

Cause: Learning rate too high or optimizer issue

Solution: Reduce learning rate or adjust optimizer

demo_config["training"]["optimizer"]["learning_rate"] = 5e-5  # Lower LR
demo_config["training"]["optimizer"]["weight_decay"] = 1e-6   # Less regularization

Additional Resources¤

Documentation¤

Papers and Resources¤

External Tools¤