Israfel Salazar
Hi there!
I am an ELLIS PhD fellow at University of Copenhagen, advised by Desmond Elliot. My current research focuses on vision-language understanding and representation. I have broad interests in machine learning, including motion and spatial reasoning, and robotics.
Previously, I completed the M.Sc. in Applied Mathematics (MVA) at ENS Paris-Saclay and the M.Sc. in Electrical Engineering at Université Paris-Saclay. I’ve worked with generative models for image restoration at DxO, Bayesian generative modeling at Inria, and multimodal representation learning at HuggingFace. I worked as a robotics engineer after studying mechanical engineering and applied physics at the University of Chile.
News
- [2025-11] Presented SPECS at EMNLP 2025! 🇨🇳
- [2025-08] SPECS accepted to EMNLP 2025. 🥳
- [2025-08] CaMMT accepted to Findings of EMNLP 2025. 🥳
Publications
Preprints
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
Investigates the bidirectional relationship between compositional training and long-caption understanding in vision-language models, revealing that these capabilities can be jointly learned through training on dense, grounded descriptions.
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
A comprehensive exam benchmark covering 18 languages and 14 subjects with 20,911 multiple-choice questions for massively multilingual vision-language model evaluation.
Conference Papers
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
EMNLP, 2025
A reference-free metric for evaluating long image captions that emphasizes specificity by rewarding correct details and penalizing incorrect ones.