Developing an ontological framework for facilitating the exploitation and re-use of phenomics data based on a formalisation of numerical relationships

Luis-Felipe Vargas-Rojas (LEPSE, INRAE)


Date
29 janv. 2021

In recent years, plant phenomics has produced massive datasets involving experiments performed in the field and controlled conditions, concerning hundreds of genotypes at different scales of organisation. Taken together, these datasets are unprecedented resources for identifying and testing novel mechanisms and models (Tardieu et al., 2017). Assembling and organising such datasets is not straightforward because of the heterogeneous, multi-scale and multi-source nature of data, to deal partially with these issues, the phenomics community has proposed an ontology-driven Information System (PHIS, www.phis.inra.fr, Neveu et al., 2019) based on FAIR principles (Wilkinson et al., 2016). However, exploitation and re-use of these datasets have not reached its full potential because (1) metadata is often merely informative, (2) relationships between numerical attributes are poorly formalised, whereas (3) ontological reasoning is more efficient for representing categorical data. For instance, relationships such as unit conversion are not effectively used, even if the data is well-annotated and information to perform the computation is provided by unit ontologies (OM, QUDT). The goal of the thesis is to create an ontological framework for representing and computing different kinds of numerical relationships for plant phenomics attributes. It will focus on equations representing most current variables and data-manipulation processes inplant phenomics (e.g. unit conversions, thermal time, and phyllochron). For each use-case, details about metadata, context-dependencies, links between domain-specific ontologies and the formalisation of the equation structure, will be presented. Finally, the concrete machinery to perform these context-aware computations and an effective information retrieval, meant to reduce the user’s time-effort and the query definition complexity, will be proposed.;;