Pause
Read
CEA vacancy search engine

Fine-grained and spatio-temporally grounded large multimodal models


Thesis topic details

General information

Organisation

The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.

Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.

The CEA is established in ten centers spread throughout France
  

Reference

SL-DRT-25-0820  

Direction

DRT

Thesis topic details

Category

Technological challenges

Thesis topics

Fine-grained and spatio-temporally grounded large multimodal models

Contract

Thèse

Job description

This PhD project focuses on enhancing Large Multimodal Models (LMMs) through the integration of fine-grained and spatio-temporal information into training datasets. While current LMMs such as CLIP and Flamingo show strong performance, they rely on noisy and coarse-grained image-text pairs and often lack spatial or temporal grounding. The thesis aims to develop automatic pipelines to enrich image datasets with geographic and temporal metadata, refine captions using fine-grained semantic descriptors, and balance dataset diversity and compactness by controlling class-wise sample sizes.

Training strategies will incorporate hierarchical class structures and adapt protocols to improve alignment between caption elements and image regions. The work will also explore joint training regimes that integrate fine-grained, spatial, and temporal dimensions, and propose set-based inference to improve the diversity of generated outputs. The enriched datasets and models will be evaluated using existing or newly developed benchmarks targeting contextual relevance and output diversity. The project also addresses challenges in metadata accuracy, efficient model adaptation, and benchmarking methodologies for multi-dimensional model evaluation.

Applications include improved synthetic data generation for autonomous driving, enhanced annotation of media archives through contextual captioning, and better visual reasoning in industrial simulation scenarios.

University / doctoral school

Sciences et Technologies de l’Information et de la Communication (STIC)
Paris-Saclay

Thesis topic location

Site

Saclay

Requester

Position start date

01/10/2025

Person to be contacted by the applicant

KARA Sandra
CEA
DRT/DIASI//LASTI
CEA SACLAY - NANO INNOV
BAT. 861
Point courier 173
91191 GIF SUR YVETTE

Tutor / Responsible thesis director

POPESCU Adrian adrian.popescu@cea.fr
CEA
DRT/DIASI//LASTI
CEA SACLAY - NANO INNOV
BAT. 861
Point courier 173
91191 GIF SUR YVETTE

0169080154

En savoir plus


https://kalisteo.cea.fr/index.php/