General information
Organisation
The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.
Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.
The CEA is established in ten centers spread throughout France
Reference
SL-DRT-26-0685
Direction
DRT
Thesis topic details
Category
Technological challenges
Thesis topics
Few-shot event and complex relation extraction from text applied to scientific literature
Contract
Thèse
Job description
Information extraction from text, which falls under the broader field of Natural Language Processing, has been the subject of research for many years. These efforts have primarily focused on Named Entity Recognition, relation extraction between entities, and, in its most complex form, event extraction, a task typically formulated as filling predefined templates from unstructured text. Within this framework, the objective of this thesis is to design, develop, and evaluate event extraction models operating on scientific articles. In this context, an 'event' may correspond to a set of entities and relations characterizing, for instance, a chemical reaction or an experiment. Furthermore, these models must be capable of being defined from a highly restricted set of annotated data to allow for rapid adaptation to new scientific domains.
From a methodological standpoint, the proposed thesis seeks to move beyond the current, almost reflexive tendency to rely exclusively on Large Language Models (LLMs). Instead, it advocates for a potential synergy between LLMs and smaller encoder-based models within a few-shot context. In this synergy, the former are leveraged, through the generation of synthetic data and annotations, to build the resources necessary to implement the latter via pre-training mechanisms. This thesis will be conducted within the framework of the AIKO project of the Digital Programs Agency, which focuses on knowledge extraction from scientific publications.
University / doctoral school
Sciences et Technologies de l’Information et de la Communication (STIC)
Paris-Saclay
Thesis topic location
Site
Saclay
Requester
Position start date
01/10/2026
Person to be contacted by the applicant
FERRET Olivier
olivier.ferret@cea.fr
CEA
DRT/DIASI/SIALV/LASTI
CEA Saclay Nano-INNOV
Institut CARNOT CEA LIST
Laboratoire Analyse Sémantique Texte et Image (LASTI)
Point courrier n°184
91191 Gif sur Yvette CEDEX
01 69 08 01 47
Tutor / Responsible thesis director
FERRET Olivier
olivier.ferret@cea.fr
CEA
DRT/DIASI/SIALV/LASTI
CEA Saclay Nano-INNOV
Institut CARNOT CEA LIST
Laboratoire Analyse Sémantique Texte et Image (LASTI)
Point courrier n°184
91191 Gif sur Yvette CEDEX
01 69 08 01 47
En savoir plus
http://oferret.free.fr
https://kalisteo.cea.fr/