SL RNA trans-splicing
Spliced leader (SL) trans-splicing has been found in a phylogenetically disjointed group of eukaryotes (Hastings 2005), in which a short RNA fragment (i.e. SL, ~15-50 nt) from a small non-coding RNA (SL RNA) is spliced at the splice acceptor site in the 5’-untranslated region of an independently transcribed pre-mRNA. Mature mRNAs are formed with the SL sequence occupying their 5’ ends (for reviews see: Blumenthal 2005; Hastings 2005; Mayer and Floeter-Winter 2005). This process can have a multitude of functions: 1) generating translatable monocistronic mRNAs from polycistronic precursor transcripts; 2) sanitizing the 5’ end of mRNAs; 3) stabilizing mRNAs, and 4) regulating gene translation. Among the organisms examined, SL trans-splicing is found in Euglenozoa, nematodes, platyhelminthes, cnidarians, rotifers, ascidians, and appendicularia. The SL RNA contains two functional domains: an exon (i.e. SL) that is transferred to an mRNA, and an intron that contains a consensus binding site (Sm-binding motif) for the assembly of small nuclear ribonucleoprotein particles (snRNPs). The SL RNA bears low sequence similarity across phyla, however, a secondary structure is conserved in most lineages (Bruzik et al. 1988; Mayer and Floeter-Winter 2005). The SL 5’-cap structure in different organisms is not always the same. In trypanosomes the SL carries a hypermethylated structure, consisting of an inverted 7-methylguanosine (m7G) followed by four nucleotides (nt) with 2’-O-ribose and three base methylations (termed cap 4). In worms the SL carries an inverted 2,2,7-trimethylguanosine 5’-cap. In both groups the heptameric Sm-protein complex, a structure formed on several U-rich snRNPs and involved in both cis- and trans-splicing, interacts with the SL RNAs through Sm-binding motif.
The presence of SL trans-splicing was described recently in dinoflagellates (Zhang et al. 2007b), a group of unicellular eukaryotes belonging to the Alveolata lineage that contribute significantly to marine primary production, growth of coral reefs, and harmful algal blooms. Through the analysis of hundreds of full-length cDNAs from fifteen representative species of dinoflagellates, we demonstrated that nuclear-encoded mRNAs in all species, from ancestral to derived lineages, are trans-spliced with the addition of the 22-nt conserved SL, . In dinoflagellates, the primary structure of SL RNA appears to be different from most of its counterparts in other organisms: 1) the SL RNA transcripts are unusually short at 50-64 nt, with a conserved Sm-binding motif (AUUUUGG) located in the SL (exon) rather than the intron, as in other organisms; and 2) the 5’-terminal position is predominantly U or A, a feature that may affect capping and subsequent translation and stability of the recipient mRNA. Since the association of the Sm complex with U-rich small nuclear RNAs (snRNAs) in vertebrates signals nuclear import, its presence in the dinoflagellate SL creates the paradox as to how the Sm-binding site could remain on mature mRNAs without impeding cytosolic localization or translation of the mRNAs.
We have further investigated whether organization of SL RNA and its gene represents an evolutionary trend within the dinoflagellate phylum. We systematically examined SL RNA size and genomic structure for K. brevis strain CCMP2228, along with a selection of phylogenetically diverse dinoflagellates. The species chosen for this study includes representatives of dinoflagellate Orders Gymnodiniales, Peridiniales, Prorocentrales, and Suessiales, that are distributed throughout the phylogenetic spectrum (Saldarriaga et al. 2001, Zhang et al. 2007). The species represent isolates with distinct autotrophic, heterotrophic and mixotrophic nutritional requirements, and polar and subtropical ecological niches. We found that while the size of dinoflagellate SL RNA transcripts can vary from 42 to 92 nt, the major ones are 56 to 59 nt. Both the length and sequence of SL RNA is conserved in all dinoflagellates, including K. brevis [for which some apparent long SL RNA was reported by Lidie and van Dolah (2007)], and the SL RNA gene is organized both in single gene tandem repeats and in mixed SL RNA-5S rRNA (SL-5S) arrangements, with numerous variations. The diverse SL genomic structure appears to be a result of rampant genomic duplication and chromosomal recombination; however, the complexity of SL gene structure does not mirror the dinoflagellate phylogenetic tree. These results suggest that the genomic duplication and chromosomal recombination occurs in individual lineages of dinoflagellate and is ongoing.
[excerpts from Zhang, Campbell, Strum and Lin, Mol. Biol. Evol. 26: 1757-1771 (2009)].
Genome size evolution
One most profound attribute of dinoflagellates is their huge genomes. In the past half century, the genome sizes of more than 30 dinoflagellates have been measured using various methods, giving a range of 3-278 pg DNA per genome or cell (e.g. Holm-Hansen 1969, Rizzo 1987, Veldhuis et al. 1997, LaJeunesse et al. 2005, etc), which is about 1-80 times that of the human haploid genome. Although smaller genomes may occur in some yet unrecognized dinoflagellates (Lin 2006), dinoflagellate genomes are still huge considering their relatively small cell size and being “simple” organisms, another case of C-value enigma (Gregory 2001). Equally striking is the wide range of genome size, which cannot be explained by conceivable difference in their apparent function or cell size. These peculiarities raise a question as to what portion of the dinoflagellate genome is protein coding and what function the remainder has. Early work on biochemical properties has shown that a large fraction of the nuclear DNA in the dinoflagellate Crypthecodinium cohnii and Prorocentrum cassubicum is composed of repeated and interspersed DNA sequences (Allen et al. 1975, Rizzo 1987, Hinnebusch et al. 1980, and Steele 1980). Hence, it has been suggested that the large fraction of the dinoflagellate genomes are nonfunctional (Anderson et al. 1992). Recently, we have detected potential binding targets of microRNA in dinoflagellate mRNA (Lin et al. unpublished), suggesting existence of functional noncoding elements in dinoflagellates genomes.