Pause
Read
CEA vacancy search engine

Attention-based Binarized Visual Encoder for LLM-driven Visual Question Answering


Thesis topic details

General information

Organisation

The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.

Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.

The CEA is established in ten centers spread throughout France
  

Reference

SL-DRT-25-0593  

Direction

DRT

Thesis topic details

Category

Technological challenges

Thesis topics

Attention-based Binarized Visual Encoder for LLM-driven Visual Question Answering

Contract

Thèse

Job description

In the context of smart image sensors, there is an increasing demand to go beyond simple inferences such as classification or object detection, to add more complex applications enabling a semantic understanding of the scene. Among these applications, Visual Question Answering (VQA) enables AI systems to answer questions by analyzing images. This project aims to develop an efficient VQA system combining a visual encoder based on Binary Neural Networks (BNN) with a compact language model (tiny LLM). Although LLMs are still far from a complete hardware implementation, this project represents a significant step in this direction by using a BNN to analyze the context and relationship between objects of the scene. This encoder processes images with low resource consumption, allowing real-time deployment on edge devices. Attention mechanisms can be taken into consideration to extract the semantic information necessary for scene understanding. The language model used can be stored locally and adjusted jointly with the BNN to generate precise and contextually relevant answers.
This project offers an opportunity for candidates interested in Tiny Deep Learning and LLMs. It proposes a broad field of research for significant contributions and interesting results for concrete applications. The work will consist of developing a robust BNN topology for semantic scene analysis under certain hardware constraints (memory and computation) and integrating and jointly optimizing the BNN encoder with the LLM, while ensuring a coherent and performant VQA system across different types of inquiries.

University / doctoral school

Electronique, Electrotechnique, Automatique, Traitement du Signal (EEATS)
Université Grenoble Alpes

Thesis topic location

Site

Grenoble

Requester

Position start date

01/10/2025

Person to be contacted by the applicant

NGUYEN Thien vanthien.nguyen@cea.fr
CEA
DRT/DOPT//L3I
CEA leti/DOPT
Minatec Campus
17, rue des Martyrs
38054 Grenoble Cedex
0438780980

Tutor / Responsible thesis director

GUICQUERO William william.guicquero@cea.fr
CEA
DRT/DOPT//L3I
CEA leti/DOPT
Minatec Campus
17, rue des Martyrs
38054 Grenoble Cedex
04 38 78 09 57

En savoir plus