GATB Core Documentation

What is GATB ?

GATB means "Genome Analysis Toolbox with de-Bruijn graph".

The GATB-CORE project provides a set of highly efficient algorithms to analyse NGS data sets. These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge amount of reads data coming from any kind of organisms such as bacteria, plants, animals and even complex samples (e.g. metagenomes). More:

GATB is made two main parts:

  • the GATB-CORE library: for development purpose, GATB-CORE enables the creation of new software tools
  • the GATB-Tools: contains ready-to-use softwares relying on GATB-CORE. More here.

The GATB project has been published in BioInformatics in 2014. There are also several publications about GATB use cases and tools available here.

Purpose of the GATB core library

gatb::core is a high-performance and low memory footprint C++ library.

It supports the following operations natively:

  • FASTA/FASTQ parsing and writing; support of plain text and gzipped files
  • K-mer counting
  • Minimizer computation of k-mers, partitioning of datasets by minimizers
  • de Bruijn graph construction
  • de Bruijn graph traversal operations (contigs, unitigs)

One structure is central to the GATB project: the De Bruijn graph. This sort of data structure is today widely used in NGS software (like assembly softwares).

So, one can say that GATB-CORE library provides means to build and use De Bruijn graphs with a low memory footprint, which comes initially from the minia assembly tool.

However, in addition to the de Bruijn graph data structure, GATB-Core provides several other ones that can be of interest for general purpose developments. These are:

  • Open-Addressing Hash Table
  • Linked-List Hash Table
  • Bloom Filters. There are several flavors: basic, cache-optimized, optimized for k-mer neighbours; accessible through BloomFactory.
  • Minimal Perfect Hash Function (BBHash)

The documentation you are reading is the official documentation of the gatb::core reference API. The audience is therefore developers interested in creating bioinformatics softwares.

Services provided by the GATB core library

From the client point of view, the gatb::core package provides:

  • libraries that offer low level genomic operations, up to the De Bruign graph creation
  • tests of the libraries
  • snippets showing how to use the library
  • specific binaries that rely on the libraries
  • wrappers of the libraries services for several langages (java, python, ...)

You will find here the code documentation for namespaces, classes, methods of the different components that compose the gatb::core design.

How can I make a new software using GATB core library ?

As a starting point, it is strongly recommended to have a look at How to use the library ?. You will find there information about the compilation process and how to create a new project based on gatb::core.

You will find also a lot of snippets showing gatb::core in action.


You can get support on the BioStars forum here.

You can also have general information about the GATB project. You will find here high level tutorials about GATB.

Other material

You can also read the related pages: