IFPEN’s biotechnology and data processing research teams are developing original scientific methodologies and bioinformatic analysis tools designed to help improve our understanding of the enzyme-producing fungi used for future biofuels. This research offers invaluable assistance for the analysis of massive amounts of “omic” data, in order to more rapidly identify the most efficient enzymatic processes.
More detailed information can be found in publications [1,2,3,4], and in the following video: https://www.youtube.com/watch?v=ZUQj9YMPdVU
The rise of biocatalysts
The emergence of industrial bioprocesses represents a major challenge in the context of the “Energy transition” and “New Industrial France solutions”. The associated research activities include production processes for 2nd-generation biofuels, making it possible to recycle plant waste by converting lignocellulose (a non-food component, produced by plant walls) into sugars that are ethanol precursors.
In production processes based on lignocellulosic biomass, one of the crucial stages - and, above all, one of the most expensive - is the production of biocatalysts (enzymes) capable of making this conversion competitive. To improve this stage, we need to gain a clearer understanding of enzyme-producing microorganisms, such as Trichoderma reesei, a filamentous fungus (figure 1).
Large volumes of data to be processed
Research protocols focusing on understanding living organisms have been significantly boosted by the emergence of what are known as “omic” technologies*. Such data provide unprecedented access, on different scales, to fundamental biological mechanisms, thereby providing an abundance of complex information about how cells work. Analysis of the genome (DNA sequences), the transcriptome (gene expression), or the metabolome (molecules produced by metabolism) are a few examples (https://en.wikipedia.org/wiki/List_of_omics_topics_in_biology). Analyses of this type generate high volumes of data, offering a wealth of potential information but demanding cross-disciplinary skills, at the intersection of biotechnologies and algorithmic analysis development, for its effective integration and interpretation. In fact, although the cells of a given organism all possess an identical genome, gene expression varies significantly from one cell to another, depending on the period (growth, reproduction, etc.) and depending on the conditions in which the organism studied is placed.
Already, the combination of genomic data (the “notes”) and transcriptomic data (the “music”) in conditions representative of industrial processes has allowed IFPEN and its research partners to identify some essential genes (“soloists”) involved in enzyme production [1,2,3]. This new knowledge has led to several patents being filed, aimed at the development of more powerful microorganisms, with a view to more efficient processes.
However, more exhaustive analysis of this mass of heterogeneous data represents a major challenge, with the sheer quantity of genes involved (several thousands) and exponential number of resulting interactions often leading to a focus on the most directly accessible information. Theoretically, this means selecting a small number of highly visible, extremely “loud” genes, at the risk of ignoring other “quieter” ones. And yet the latter may play a major role in the biological interpretation of the genomic musical score.
BRANE Cut bioinformatics analysis software
Collaborative research carried out by IFPEN’s Biotechnology and Control, Signal and System teams has led to the development of Biologically-Related Apriori Network Enhancement with Graph cuts for Gene Regulatory Network Inference (BRANE Cut), a new bioinformatics analysis tool. This tool represents crossover interaction measurements in graph structure and models the biological coupling expected between different groups of genes, within a regulatory network (figure 2).
The methodology makes it possible, for example, to determine relationships between “regulatory” genes and other enzyme-producing genes. The originality of this tool resides in the formulation of the optimization problem associated with the modeled coupling, resolved using a very rapid graph cut algorithm.
The relevance of the results supplied by this tool has been validated on model microorganisms , for which the biological mechanisms - especially the genetic interactions - are relatively well known. This validation is performed on both experimental data and simulated data (challenges “Dialogue for Reverse Engineering Assessments and Methods“ (DREAM4, DREAM5 ). These reference data sets enable benchmarking of biological network inference methods, for which BRANE Cut offers a level of precision higher than the current state of the art, for standard classification measurements (figure 3).
This type of validation means that the tool can be confidently used for the analysis of less well known organisms. Applied to strains of Trichoderma reesei, it confirms and consolidates the knowledge previously acquired using biological expertise .
On this promising basis, BRANE Cut is currently being tested for the predictive analysis of new “omic” data sets concerning strains of Trichoderma reesei presenting different genetic heritages and cellulase production potentials. The aim of this research is to more accurately define interacting gene groups (“in unison”), both strongly and weakly expressed, in the production of biocatalysts.
 Le Crom, S. et al., Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing, Proceedings of the National Academy of Sciences of the United States of America, 2009 [DOI : 10.1073/pnas.0905848106]
 Marie-Nelly, H. et al., High-quality genome (re) assembly using chromosomal contact data, Nature Communications, 2014 [DOI : 10.1038/ncomms6695]
 Poggi-Parodi, D. et al., Kinetic transcriptome analysis reveals an essentially intact induction system in a cellulase hyper-producer Trichoderma reesei strain, Biotechnology for biofuels, 2014 [DOI :10.1186/s13068-014-0173-z]
 Pirayre, A. et al., BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference, BMC Bioinformatics, 2015 [DOI : 10.1186/s12859-015-0754-2]
 Marbach, D. et al., Wisdom of crowds for robust gene network inference, Nature Methods, 2012 [DOI : 10.1038/nmeth.2016]
* “Omic” sciences enable massive quantities of data to be generated on multiple biological levels. From gene sequencing to the expression of proteins and metabolic structures, these data can cover all the mechanisms involved in the variations that occur within cell networks and that influence the overall functioning of organic systems.