Pause
Read
CEA vacancy search engine

Few-shot event and complex relation extraction from text applied to scientific literature


Thesis topic details

General information

Organisation

The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.

Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.

The CEA is established in ten centers spread throughout France
  

Reference

SL-DRT-26-0685  

Direction

DRT

Thesis topic details

Category

Technological challenges

Thesis topics

Few-shot event and complex relation extraction from text applied to scientific literature

Contract

Thèse

Job description

Information extraction from text, which falls under the broader field of Natural Language Processing, has been the subject of research for many years. These efforts have primarily focused on Named Entity Recognition, relation extraction between entities, and, in its most complex form, event extraction, a task typically formulated as filling predefined templates from unstructured text. Within this framework, the objective of this thesis is to design, develop, and evaluate event extraction models operating on scientific articles. In this context, an 'event' may correspond to a set of entities and relations characterizing, for instance, a chemical reaction or an experiment. Furthermore, these models must be capable of being defined from a highly restricted set of annotated data to allow for rapid adaptation to new scientific domains.

From a methodological standpoint, the proposed thesis seeks to move beyond the current, almost reflexive tendency to rely exclusively on Large Language Models (LLMs). Instead, it advocates for a potential synergy between LLMs and smaller encoder-based models within a few-shot context. In this synergy, the former are leveraged, through the generation of synthetic data and annotations, to build the resources necessary to implement the latter via pre-training mechanisms. This thesis will be conducted within the framework of the AIKO project of the Digital Programs Agency, which focuses on knowledge extraction from scientific publications.

University / doctoral school

Sciences et Technologies de l’Information et de la Communication (STIC)
Paris-Saclay

Thesis topic location

Site

Saclay

Requester

Position start date

01/10/2026

Person to be contacted by the applicant

FERRET Olivier olivier.ferret@cea.fr
CEA
DRT/DIASI/SIALV/LASTI
CEA Saclay Nano-INNOV
Institut CARNOT CEA LIST
Laboratoire Analyse Sémantique Texte et Image (LASTI)
Point courrier n°184
91191 Gif sur Yvette CEDEX

01 69 08 01 47

Tutor / Responsible thesis director

FERRET Olivier olivier.ferret@cea.fr
CEA
DRT/DIASI/SIALV/LASTI
CEA Saclay Nano-INNOV
Institut CARNOT CEA LIST
Laboratoire Analyse Sémantique Texte et Image (LASTI)
Point courrier n°184
91191 Gif sur Yvette CEDEX

01 69 08 01 47

En savoir plus

http://oferret.free.fr
https://kalisteo.cea.fr/