MPHFAlgorithm< span, Abundance_t, NodeState_t > Class Template Reference

Algorithm that builds a hash table whose keys are kmers and values are kmer abundances. More...

#include <MPHFAlgorithm.hpp>

Inheritance diagram for MPHFAlgorithm< span, Abundance_t, NodeState_t >:
Inheritance graph

Public Types

typedef kmer::impl::Kmer< span >::Type Type
typedef tools::collections::impl::MapMPHF< Type, Abundance_t > AbundanceMap
typedef tools::collections::impl::MapMPHF< Type, NodeState_t > NodeStateMap
typedef u_int8_t Adjacency_t

Public Member Functions

 MPHFAlgorithm (tools::storage::impl::Group &group, const std::string &name, tools::collections::Iterable< Count > *solidCounts, tools::collections::Iterable< Type > *solidKmers, unsigned int nbCores, bool buildOrLoad, tools::misc::IProperties *options=0)
 ~MPHFAlgorithm ()
void execute ()
float getNbBitsPerKmer () const
AbundanceMapgetAbundanceMap () const
- Public Member Functions inherited from Algorithm
 Algorithm (const std::string &name, int nbCores=-1, gatb::core::tools::misc::IProperties *input=0)
virtual ~Algorithm ()
std::string getName () const
void run ()
virtual IPropertiesgetInput ()
virtual IPropertiesgetOutput ()
virtual IPropertiesgetInfo ()
virtual dp::IDispatchergetDispatcher ()
virtual TimeInfogetTimeInfo ()
virtual IPropertiesgetSystemInfo ()
template<typename Item >
dp::Iterator< Item > * createIterator (dp::Iterator< Item > *iter, size_t nbIterations=0, const char *message=0, dp::IteratorListener *listener=0)
virtual dp::IteratorListenercreateIteratorListener (size_t nbIterations, const char *message)
- Public Member Functions inherited from SmartPointer
void use ()
void forget ()
- Public Member Functions inherited from ISmartPointer
virtual ~ISmartPointer ()

Static Public Attributes

static const Abundance_t MAX_ABUNDANCE = std::numeric_limits<Abundance_t>::max()

Additional Inherited Members

- Static Public Member Functions inherited from Algorithm
template<template< size_t > class Functor>
static int mainloop (tools::misc::IOptionsParser *parser, int argc, char *argv[])
- Protected Member Functions inherited from Algorithm
std::string getUriByKey (const std::string &key)
std::string getUri (const std::string &str)
void setInput (IProperties *input)
- Protected Member Functions inherited from SmartPointer
 SmartPointer ()
virtual ~SmartPointer ()

Detailed Description

template<size_t span = KMER_DEFAULT_SPAN, typename Abundance_t = u_int8_t, typename NodeState_t = u_int8_t>
class gatb::core::kmer::impl::MPHFAlgorithm< span, Abundance_t, NodeState_t >

Algorithm that builds a hash table whose keys are kmers and values are kmer abundances.

This class uses a [kmer,abundance] mapping by using a minimal perfect hash function (MPHF). For N kmers (ie. the keys), the hash function gives a unique integer value between 0 and N-1.

It uses two template parameters: 1) span : gives the max usable size for kmers 2) Abundance_t : type of the abundance values (on 1 byte by default) 2) NodeState_t : type of the node states values (on half a byte by default, grouped by two per byte)

Storing the values (ie. the abundances) is done by creating a vector of size N. Asking the abundance of a kmer consists in:

  • getting the hash code H of the kmer
  • getting the object at index H in the vector of values.

The MPHF function is built from a list of kmers values of type Kmer<span>::Type. Since the building of the MPHF may take a while, it is saved in a Storage object; more precisely, it is saved in a collection given by a couple [group,name]. Such a couple is likely to be the group of the SortingCount algorithm, with a name being by convention "mphf".

Once the MPHF is built, it is populated by the kmers abundance values, which means that we set each value of each key of the hash table. The abundances are clipped to a maximum value in order not to exceed the Abundance_t type capacity (provided as a template of the MPHFAlgorithm class). The maximum value is computed through the std::numeric_limits traits.

Once the abundance map is built and populated, it is available through the 'getAbundanceMap' method. It may be used for instance by the Graph class in order to get the abundance of any node (ie. kmer) of the de Bruijn graph.

Note: the keys of the hash table are of type Kmer<span>::Type, but we need however to have the abundance information through the Kmer<span>::Count type. That's why we need to use 2 Iterable instances, one of type Kmer<span>::Count and one of type Kmer<span>::Type.

Some statistics about the MPHF building are gathered and put into the Properties 'info'.

Member Typedef Documentation

We define the type of the hash table of couples [kmer/abundance].

typedef u_int8_t Adjacency_t

We define the type of the hash table of couples [kmer/graph adjacency information].

We define the type of the hash table of couples [kmer/node state].

typedef kmer::impl::Kmer<span>::Type Type


Constructor & Destructor Documentation

MPHFAlgorithm ( tools::storage::impl::Group group,
const std::string &  name,
tools::collections::Iterable< Count > *  solidCounts,
tools::collections::Iterable< Type > *  solidKmers,
unsigned int  nbCores,
bool  buildOrLoad,
tools::misc::IProperties options = 0 


[in]group: storage group where to save the MPHF once built
[in]name: name of the collection in the group where the MPHF will be saved
[in]solidCounts: iterable on couples [kmers/abundance]
[in]solidKmers: iterable on kmers
[in]buildOrLoad: true for build/save the MPHF, false for load only
[in]options: extra options for configuration (may be empty)


Member Function Documentation

void execute ( )

Implementation of the Algorithm::execute method.

Implements Algorithm.

AbundanceMap* getAbundanceMap ( ) const

Accessor to the map. Note : if clients get this map and use it (as a SmartPointer), the map instance will be still alive (ie. not deleted) even if the MPHFAlgorithm instance that built it is deleted first.

the map instance.
float getNbBitsPerKmer ( ) const

Get the number of bits of a value.

the number of bits per kmer.

Member Data Documentation

const Abundance_t MAX_ABUNDANCE = std::numeric_limits<Abundance_t>::max()

We define the maximum abundance according to the provided type (value set in the cpp file).

First tried to set the constant in the hpp file but got the following error: "error: a function call cannot appear in a constant-expression" Solved by putting it in the cpp... =>

The documentation for this class was generated from the following files: