FASTA Format: What Research Scientists Should Know
In bioinformatics and biochemistry—where collecting and analyzing complex biological data is a central focus—long character strings are often encoded in a format called FASTA.
In this post, we’ll provide a quick overview of the format and its uses.
5 Quick Facts about FASTA format
- It is a text-based format used for representing nucleotide or protein/amino acid sequences.
- FASTA format stores multiple sequence records.
- It allows for sequence names and comments to precede the sequences.
- Each record in FASTA format begins with a single-line description (also called the 'header line' or 'definition line'), which includes the ">" symbol, followed by the sequence ID. The next line of a record includes the sequence data.
- Base pairs are represented using single-letter codes as shown below:
Nucleic Acid Codes:
A = adenosine
C = cytidine
G = guanine
T = thymidine
N = A/G/C/T (any)
U = uridine
R = G/A (purine)
Y = T/C (pyrimidine)
K = G/T (keto)
M = A/C (amino)
S = G/C (strong)
W = A/T (weak)
B = G/T/C
D = G/A/T
H = A/C/T
V = G/C/A
Accepted Amino Acid Codes:
A = alanine
B = aspartate or asparagine
C = cystine
D = aspartate
E = glutamate
F = phenylalanine
G = glycine
H = histidine
I = isoleucine
K = lysine
L = leucine
M = methionine
N = asparagine
P = proline
Q = glutamine
R = arginine
S = serine
T = threonine
U = selenocysteine
V = valine
W = tryptophan
Y = tyrosine
Z = glutamate or glutamine
X = any
A single dash or hyphen (-) can be used to represent a gap of indeterminate length and an asterisk (*) can be used to represent a translation stop.
Here are three examples of how FASTA format looks:
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
>U06486.1 Human Wilms' tumor (WT1) gene, 5' region, partial cds CAGTGTCTTGTAGAATCTTCAGTGTCTTGATAATAATTTTAAAAGCTTCTGAGTGGAGACGACGCAAAGTCAAGCAGCAAAGGTGGCCTGGGAGGCAAGCGGAGGGCTCAAGTGCCGCATCTTTACCCTCAGGGTCTCCTGCGCCTACGGGATGCGCATTCCCAAGAAGTGCGCCCTTCGAGTAA
Putting FASTA Format to Use
One of the oldest recognized formats in bioinformatics, FASTA format is still widely used in sequence retrieval due to its simplicity and flexibility. Indeed, the format is considered an almost universal standard in the bioinformatics field of research.
In our ongoing effort to help make researchers' lives easier, the Gadgeteers here at Research Solutions have incorporated FASTA Format into a number of our lab analysis and productivity apps or Gadgets, including:
- Nucleotide Sequence Editor – Upload a FASTA file to easily manipulate one or more nucleotide sequences.
- Plasmid Visualizer – Paste or upload your DNA sequence data to view plasmid sequence annotation and plasmid visualization for circular sequences.
- Peptide/Protein Calculator – Upload peptide or protein FASTA-format sequence data to quickly calculate amino acid content, aromaticity, flexibility, instability index, isoelectric point, and more.
- Sequence Annotator – Upload nucleotide FASTA format files to quickly visualize and annotate linear DNA or RNA sequences.
- Biomolecular Sequence Manager – Access and manage all your FASTA format files in one place.
With lab productivity Gadgets, you can save up to 50% of your research time by automating routine tasks – including those that involve FASTA format.
If you haven't done so already, we invite you to sign up for a free account and put these Gadgets to the test! We think you'll be pleased. And of course, we welcome any feedback you may have.