FASTA Format: What Research Scientists Should Know

Written by Research Solutions | Marketing Team | Jul 18, 2019 7:43:00 PM

In bioinformatics and biochemistry—where collecting and analyzing complex biological data is a central focus—long character strings are often encoded in a format called FASTA.

In this post, we’ll provide a quick overview of the format and its uses.

5 Quick Facts about FASTA format

It is a text-based format used for representing nucleotide or protein/amino acid sequences.

FASTA format stores multiple sequence records.

It allows for sequence names and comments to precede the sequences.

Each record in FASTA format begins with a single-line description (also called the 'header line' or 'definition line'), which includes the ">" symbol, followed by the sequence ID. The next line of a record includes the sequence data.

Base pairs are represented using single-letter codes as shown below:

Nucleic Acid Codes:
A = adenosine
C = cytidine
G = guanine
T = thymidine
N = A/G/C/T (any)
U = uridine
R = G/A (purine)
Y = T/C (pyrimidine)
K = G/T (keto)
M = A/C (amino)
S = G/C (strong)
W = A/T (weak)
B = G/T/C
D = G/A/T
H = A/C/T
V = G/C/A

Accepted Amino Acid Codes:
A = alanine
B = aspartate or asparagine
C = cystine
D = aspartate
E = glutamate
F = phenylalanine
G = glycine
H = histidine
I = isoleucine
K = lysine
L = leucine
M = methionine
N = asparagine
P = proline
Q = glutamine
R = arginine
S = serine
T = threonine
U = selenocysteine
V = valine
W = tryptophan
Y = tyrosine
Z = glutamate or glutamine
X = any

A single dash or hyphen (-) can be used to represent a gap of indeterminate length and an asterisk (*) can be used to represent a translation stop.

Here are three examples of how FASTA format looks:

>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM

>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*

>U06486.1 Human Wilms' tumor (WT1) gene, 5' region, partial cds CAGTGTCTTGTAGAATCTTCAGTGTCTTGATAATAATTTTAAAAGCTTCTGAGTGGAGACGACGCAAAGTCAAGCAGCAAAGGTGGCCTGGGAGGCAAGCGGAGGGCTCAAGTGCCGCATCTTTACCCTCAGGGTCTCCTGCGCCTACGGGATGCGCATTCCCAAGAAGTGCGCCCTTCGAGTAA

Putting FASTA Format to Use

One of the oldest recognized formats in bioinformatics, FASTA format is still widely used in sequence retrieval due to its simplicity and flexibility. Indeed, the format is considered an almost universal standard in the bioinformatics field of research.
In our ongoing effort to help make researchers' lives easier, the Gadgeteers here at Research Solutions have incorporated FASTA Format into a number of our lab analysis and productivity apps or Gadgets, including:

Nucleotide Sequence Editor – Upload a FASTA file to easily manipulate one or more nucleotide sequences.

Plasmid Visualizer – Paste or upload your DNA sequence data to view plasmid sequence annotation and plasmid visualization for circular sequences.
Peptide/Protein Calculator – Upload peptide or protein FASTA-format sequence data to quickly calculate amino acid content, aromaticity, flexibility, instability index, isoelectric point, and more.

Sequence Annotator – Upload nucleotide FASTA format files to quickly visualize and annotate linear DNA or RNA sequences.

Biomolecular Sequence Manager – Access and manage all your FASTA format files in one place.

With lab productivity Gadgets, you can save up to 50% of your research time by automating routine tasks – including those that involve FASTA format.

If you haven't done so already, we invite you to sign up for a free account and put these Gadgets to the test! We think you'll be pleased. And of course, we welcome any feedback you may have.

View full post