Skip to content

Cross-Modal Retrieval¤

Coming Soon

This example is planned for a future release. Check back for updates on cross-modal retrieval implementations.

Overview¤

This example will demonstrate:

  • Text-to-image retrieval
  • Image-to-text retrieval
  • Joint embedding spaces
  • Similarity-based ranking

Planned Features¤

  • Dual encoder architectures
  • Contrastive learning objectives
  • Hard negative mining
  • Efficient retrieval with approximate nearest neighbors

References¤

  • Faghri et al., "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" (2018)
  • Lee et al., "Stacked Cross Attention for Image-Text Matching" (2018)