Dinoflagellates are unicellular algae (heterotrophic protists for those without a chloroplast), forming a monophyletic group Alveolata with apicomplexans and ciliates. They are the second largest group of primary producers in the ocean (second to diatoms), indispensable for coral reef building and major contributors of harmful algal blooms. They possess two flagella and wear a cellulosoic theca or are naked (Fig. 1). Dinoflagellates possess a life cycle consisting of a vegetative stage reproducing by binary division, and cyst stages resulting from sexual fusion in response to unfavorable environmental conditions. They can be free-living in fresh or salty water, as plankton or sand dwellers and can be symbiotic or parasitic. About half of the 2000 extant dinoflagellates are heterotrophic ingesting other algae or dissolved organic matter (Schnepf and Elbrachter 1999),
some of which can enslave ingested algal chloroplasts and perform ephemeral photosynthesis (e.g. Lewitus et al. 1999, Feinstein et al. 2002). These heterotrophic taxa are potentially important micrograzers in the microbial food web (Nakamura 1999). Some 60 taxa of dinoflagellates are known to form red tides, over 20 of which produce toxins that have profound impacts on fisheries industry, recreational values of coastal zones, and public health (Anderson 1994, 1996).
In evolutionary history, dinoflagellate genomes not only have undergone vertical evolution but have been impacted by rampant horizontal gene transfer. Dinoflagellates represent a photosynthetic organism with the most reduced plastid genome. The typical, peridinin-containing lineages have plastid genomes broken into single-gene minicircles that encode only 16 of the typically 130-200 plastid proteins (Koumandou et al. 2004). Transfer of the rest of the plastid genes to the nucleus has dramatically reconfigured the nuclear genome in dinoflagellates (Yoon et al. 2005). In addition, Rubisco has been replaced by that of proteobacterial origin (form II) likely through lateral gene transfer (Morse et al. 1995, Palmer 1996, Delwiche and Palmer 1996). Dinoflagellata is thought to lack histone proteins with few exceptions (e.g. Amoebophrya
) (Rizzo 2003), but a histone H3-like protein (Okamoto and Hastings 2003) and a histone H2A.X (Hackett et al 2005) was reported recently in addition to findings of basic and acidic nuclear proteins (Hackett et al. 2005 and references therein, Zhang and Lin unpubl. data). Also, typical dinoflagellate cells divide with closed mitosis and extranuclear spindles, and chromosomes are permanently condensed. Other uncommon features for an eukaryote recognized so far include the rarity of mRNA splicing and deviation from the universal GT/AG rule (Palmer 1996), the extensive and novel mRNA editing in mitochondrial genes (Lin et al. 2002, 2008), and widespread spliced leader RNA trans
-splicing (Zhang et al. 2007).
One most profound attribute of dinoflagellates is their huge genomes (Fig. 2). In the past half century, the genome sizes of more than 30 dinoflagellates have been measured using various methods, giving a range of 3-278 pg DNA per genome or cell (e.g. Holm-Hansen 1969, Rizzo 1987, Veldhuis et al. 1997, LaJeunesse et al. 2005, etc), which is about 1-80 times that of the human haploid genome. Although smaller genomes may occur in some yet unrecognized dinoflagellates (Lin 2006), dinoflagellate genomes are still huge considering their relatively small cell size and being “simple” organisms, another case of C-value enigma (Gregory 2001). Equally striking is the wide range of genome size, which cannot be explained by conceivable difference in their apparent function or cell size. These peculiarities raise a question as to what portion of the dinoflagellate genome is protein coding and what function the remainder has. Early work on biochemical properties has shown that a large fraction of the nuclear DNA in the dinoflagellate Crypthecodinium cohnii
and Prorocentrum cassubicum
is composed of repeated and interspersed DNA sequences (Allen et al. 1975, Rizzo 1987, Hinnebusch et al. 1980, and Steele 1980). Hence, it has been suggested that the large fraction of the dinoflagellate genomes are nonfunctional (Anderson et al. 1992). Recently, we have detected potential binding targets of microRNA in dinoflagellate mRNA (Lin et al. unpublished), suggesting existence of functional noncoding elements in dinoflagellates genomes.
The regulation of gene expression in these enigmatic organisms just began to be examined using a genomics approach. Several Expressed Sequence Tag (EST) projects have been done in recent years. For instance, Okamoto and Hastings (2003) used microarray and found that circadian clock related genes were regulated at the transcriptional level. Erdner and Anderson (2006) used Massively Parallel Signature Sequencing and found that about one-fourth of the expressed gene pool was regulated transcriptionally and the rest did not exhibit transcriptional regulation. Similarly, a microarray analysis on Karenia brevis
revealed little transcriptional regulation of the genome (Lidie et al 2005). Several other studies using EST analysis revealed evolution of plastid genomes in dinoflagellates (e. g. Bachvaroff et al. 2004, Hackett et al. 2005, Patron et al. 2006). Major findings in these studies include 1) that transcriptional regulation of dinoflagellate genes is rare, 2) that most dinoflagellate plastid genes have been transferred to the nuclear genome. It is still poorly understood what the dinoflagellate genomes are made of and what genes are commonly expressed under all natural conditions (“Dino core genes”), and what genes are expressed under specific conditions. We have recently launched a EST project to look at profile of expressed genes in three trophically contrasting taxa: Prorocentrum minimum
(photoautotrophic), Karlodinium micrum
(mixotrophic), and Pfiesteria piscicida
(heterotrophic). We observed distinct (largely undescribed) gene arrays expressed among the three species and changes in the profile of expressed genes in Pfiesteria
from fed to starved conditions. In addition, we may have encountered an “instant” gene transfer event (unpublished data). The problem is that the majority of the dinoflagellate genes have no hit in GenBank database. The traditional ESTs often do not provide sufficient information for prediction of the functions of these “novel” genes. To overcome this shortcoming, we taking advantage of the recently discovered spliced leader at the 5’ end of dinoflagellate nuclear-encoded mRNA, and started a project to sequence full-length cDNA for two dinoflagellate species.
Results by other researchers and by the PIs suggest 1) the dinoflagellate nuclear genome contains vastly more genes than other eukaryotes when multiple variants of each gene are taken into account; and 2) the dinoflagellate genome is rich in noncoding functional RNA genes. To gain evidence to verify these conjectures and to provide a genome-wide, well-annotated expressed gene dataset, we propose a large-scale, full-length cDNA sequencing project as a joint venture of the University of Connecticut (Department of Marine Sciences), University of Maryland (Center of Marine Biotechnology), the Venter Institute, and the Scripps Institution of Oceanography (SIO, Scripps Genome Center).
We propose to sequence Karlodinium micrum
and Amphidinium carterae
, which represent distinct phylo- and eco-types and yet produce similar toxins that cause fish kills. Their genomes are small enough to achieve significant coverage through cDNA sequencing. Utilizing the recently discovered universal 5’-UTR sequence in dinoflagellate nuclear transcripts, full-length cDNA libraries will be constructed for 3 conditions: active grazing (for K. micrum
), active cell division, and high toxin production. 60,000 clones for K. micrum
and 30,000 clones for A. carterae
will be sequenced from normalized libraries at the Venter Institute sequencing facility. The cDNA sequences achieved will be processed through bioinformatics pipelines at SIO using sequence alignment and hidden Markov model algorithms with high-speed Unix computers boosted by Timelogic boards. Sequences will be annotated by searching for conserved protein domains, motifs and structural features. Genes with recognized or putative functions will be assembled into biochemical pathways. Identification of genes involved in toxin production, cell cycle, and phaegotrophic feeding will be attempted. Gene content of each genome will be assessed, and gene regulation elements will be identified.
The experimental procedure of the project is as shown below.