Alphabets#

CharAlphabet and MolType#

MolType instances have an CharAlphabet.

CharAlphabet instances reference the MolType that created them.

Creating tuple alphabets#

You can create a tuple alphabet of, for example, dinucleotides or trinucleotides.

Convert a sequence into integers#

Convert integers to a sequence#

Converting a sequence into k-mer indices#

You can use a KmerAlphabet to convert a standard sequence into a numpy array of integers. In this case, each integer is the encoding of the dinucleotide string into the index of that dinucleotide. Because the CharAlphabet and KmerAlphabet both inherit from tuple, they have the built-in .index() method.

The to_indices() method is faster and provides more flexibility. We use that on the single dinucleotide

and on a longer sequence where we want the independent k-mers.

We can also convert the sequence into all possible k-mers.

Quality score converters#

make_qual_converter builds a callable that maps a fastq quality string (as bytes) into a numpy.uint8 array of Phred scores. Both Phred+33 and Phred+64 encodings are supported.

See Directly use the genbank format parser to load a sequence and annotations for an example combining this with iter_fastq_records.