R toolbox for deep neural networks optimized for genomic datasets

The goal of the package is to speed up the development of bioinformatical tools for sequence classification, homology detection and other bioinformatical tasks. It is developed for biologists and advanced AI researchers. DeepG is a collaborative effort from the McHardy Lab at the Helmholtz Centre for Infection Research, The Bischl lab at the University of Munich and the Huttenhower lab at Harvard T.H. Chan School of Public Health.

DOI

Overview

The package offers several functions to create, train and evaluate neural networks as well as data processing.

  • Data processing
    • Different options to encode fasta/fastq file (one-hot encoding, coverage or quality score encoding).
    • Different options to handle ambiguous nucleotides.
    • Create data generator to handle large collections of files.
  • Deep learning architectures
    • Create network architectures with single function call.
    • Custom loss and metric functions available.
  • Model training
    • Automatically create model/data pipeline.
  • Visualizing training progress
    • Visualize training progress and metrics in tensorboard.
  • Model evaluation
    • Evaluate trained models.
  • Model interpretability
    • Use Integrated Gradient to visualize relationship of model’s predictions with regard to its input.

Installation

Install the tensorflow python package

install.packages("tensorflow")
tensorflow::install_tensorflow()

and afterwards install the latest version of deepG from github

# install.packages("devtools")
devtools::install_github("GenomeNet/deepG")

Usage

See the Package website at https://deepg.de for documentation and example code.