Cross-Modal Retrieval¤
Coming Soon
This example is planned for a future release. Check back for updates on cross-modal retrieval implementations.
Overview¤
This example will demonstrate:
- Text-to-image retrieval
- Image-to-text retrieval
- Joint embedding spaces
- Similarity-based ranking
Planned Features¤
- Dual encoder architectures
- Contrastive learning objectives
- Hard negative mining
- Efficient retrieval with approximate nearest neighbors
Related Documentation¤
References¤
- Faghri et al., "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" (2018)
- Lee et al., "Stacked Cross Attention for Image-Text Matching" (2018)