Sequence Struct Reference

Structure holding genomic information. More...

#include <Sequence.hpp>

Public Member Functions

 Sequence (tools::misc::Data::Encoding_e encoding=tools::misc::Data::ASCII)
 Sequence (char *seq)
virtual ~Sequence ()
virtual const std::string & getComment () const
virtual const std::string getCommentShort () const
virtual const std::string & getQuality () const
virtual tools::misc::DatagetData ()
virtual char * getDataBuffer () const
virtual size_t getDataSize () const
virtual tools::misc::Data::Encoding_e getDataEncoding () const
virtual size_t getIndex () const
void setDataRef (tools::misc::Data *ref, int offset, int length)
void setIndex (size_t index)
std::string toString () const
void setComment (const std::string &cmt)
void setQuality (const std::string &qual)
std::string getRevcomp () const

Public Attributes

std::string _comment
std::string _quality

Detailed Description

Structure holding genomic information.

A sequence holds several data :

  • comment (as a text)
  • genomic data
  • quality information (for fastq format, empty in other cases).

The genomic data is hold in a tools::misc::Data attribute and is supposed to hold nucleotides.

Actually, the inner format may be of different kind (ASCII, INTEGER, BINARY) and depends on the type of the bank that provides Sequence objects. For instance:

  • a FASTA bank will provide Sequence instances whose data is in ASCII
  • a BINARY bank will provide Sequence instances whose data is in BINARY

The buffer holding the nucleotides is located in the tools::misc::Data attribute, so have a look there to have further details on where the buffer can be allocated. Note just here that the buffer could be stored in the Data object itself, or may be a reference to a buffer allocated in another place.

The class Sequence is closely related to the IBank interface.

Note that this class should not be instantiated directly by end users; it is more likely that end users will receive such objects through an iteration from a bank.

Example of use:

// We create an iterator on the bank
Iterator<Sequence>* it = bank->iterator();
// We iterate the sequences of the bank
for (it->first(); !it->isDone(); it->next())
// We get a shortcut on the current sequence and its data
Sequence& seq = it->item();
Data& data = seq.getData();
// We dump some information about the sequence.
std::cout << "comment " << seq.getComment() << std::endl;
// We dump each nucleotide. NOTE: the output depends on the data encoding
for (size_t i=0; i<data.size(); i++) { std::cout << data[i]; } std::cout << std::endl;
See also

Constructor & Destructor Documentation


[in]encoding: encoding scheme of the genomic data of the sequence
Sequence ( char *  seq)

Constructor. For testing mainly : allows to set the genomic data through an ascii representation. For instance, one can provide "ACTTACGCAGAT" as argument of this constructor.

[in]seq: the genomic data as an ascii string
virtual ~Sequence ( )


Member Function Documentation

virtual const std::string& getComment ( ) const
description of the sequence
virtual const std::string getCommentShort ( ) const
description of the sequence until first space
virtual tools::misc::Data& getData ( )
the data as a Data structure.
virtual char* getDataBuffer ( ) const

Return the raw buffer holding the genomic data. IMPORTANT : getting genomic data this way implies that the user knows what is the underlying encoding scheme in order to decode it (may be ASCII, INTEGER or BINARY)

buffer holding the genomic data as a raw buffer.
virtual tools::misc::Data::Encoding_e getDataEncoding ( ) const
encoding scheme of the data.
virtual size_t getDataSize ( ) const
number of nucleotides in the sequence.
virtual size_t getIndex ( ) const

Return the index of the sequence. It may be the index of the sequence in the database that holds the sequence.

index of the sequence.
virtual const std::string& getQuality ( ) const
quality of the sequence (set if the underlying bank is a fastq file).
std::string getRevcomp ( ) const

Returns a string that is the reverse complement of the sequence The Sequence object needs to be in ASCII Format

void setComment ( const std::string &  cmt)

Set the comment of the sequence (likely to be called by a IBank iterator).

[in]cmt: comment of the sequence
void setDataRef ( tools::misc::Data ref,
int  offset,
int  length 

Set the genomic data as a reference on a Data object (more precisely on a range in this data). This method may be used when one wants that the genomic data of the sequence points to an already existing buffer of nucleotides, which means that the sequence doesn't allocate any memory for storing the genomic data, it only relies on data stored somewhere else. This is mainly a shortcut to the gatb::core::tools::misc::Data::setRef method.

[in]ref: the referred Data instance holding the genomic data
[in]offset: starting index in the referred data
[in]length: length of the genomic data of the current sequence.
void setIndex ( size_t  index)

Set the index of the sequence. Typically, it should be called by a IBank iterator that knows what is the index of the currently iterated sequence.

[in]index: index of the sequence
void setQuality ( const std::string &  qual)

Set the quality string of the sequence (likely to be called by a fastq iterator).

[in]qual: quality string of the sequence.
std::string toString ( ) const

Get an ascii representation of the sequence. IMPORTANT ! this implementation supposes that the format of the Data attribute is ASCII. No conversion is done in case of other formats.

the ascii representation of the sequence.

Member Data Documentation

std::string _comment

Comment attribute (note: should be private with a setter and getter).

std::string _quality

Quality attribute (note: should be private with a setter and getter).

The documentation for this struct was generated from the following file: