Sequence of the internal transcribed spacer sequence (ITS) of ribosomal DNA of Quercus rubra and its potential use in phylogenetic analysis

Jeanne M. Farrell


The internal transcribed spacer (ITS) region of nuclear ribosomal DNA (nrDNA) has been widely used in reconstructing plant phylogeny. The ITS region separating 18s and 26s nrDNA and the 5.8s coding region sequence has become widely characterized across interspecific and intergenic level divergences (Baldwin et. al. 1995). The high copy number allows easy amplification of the region from total DNA (Baldwin 1992, Baldwin et al 1995). The ITS region has been proposed to be a useful region to gain insight into DNA sequence evolution (Herskovitz and Lewis 1996). This may be due to the fact that, as a transcribed, but untranslated sequence, it is free to vary. Due to ITS hypervariability and poor alignability across major taxonomic groups, most researchers believe that the region varies randomly. (Crouch and Bachellerie 1986), although more recent data suggest that the sequence variations reflect rapid coevolution between sequence and ribosomal processing factors (van Nues 1995). While the basis for variability has not been resolved, the utilization of this variability as an exploitable tool has become the major focus of molecular ecology, being perhaps the most used region for phylogenetics. Most major molecular ecology related journals are currently publishing one or more paper per issue focusing on the ITS region, and DNA sequencing is replacing the popularity of less conclusive isozyme and RFLP analyses. The ITS region is separated into ITS-1 and ITS-2, both immediately flanking the 5.8s gene sequence, with the former upstream and the latter downstream of that sequence. The entire ITS region (ITS-1, 5.8s, ITS-2) ranges between 565 and 700 base pairs in angiosperms, although some lengths have been published that are much larger (Bobola et al 1992, Liston et al 1996). The ITS-1 regions are generally longer and more variable than the ITS-2 region. Gernandt and Liston (1999) showed that in two genera, the ITS-1 was three times as long as the 5.8s plus ITS-2 and that it contained subrepeats. A sequence of the ITS region for Quercus rubra that has been submitted to GenBank (accession number AF098418) that reports a length of 587 base pairs, with the ITS-1 region as 219 base pairs long, the 5.8s as 165 base pairs, and the ITS-2 as 203 base pairs.

The genus Quercus often proves difficult to morphologically characterize because of intraspecific morphological variation caused by the ability to hybridize (Bacilieri et al 1996, Howard et al 1997). This fact, and large variations in vegetation (Tucker 1974) have made phylogenetic relationship studies seemingly laborious and often not powerful, and the lack of studies in primary literature reflects this. Therefore, molecular methods provide insight that would otherwise not be possible. While there are several Quercus ITS sequences reported, there is little replication of studies, with most species having sequence from one individual. For this study, a partial sequence of the 5.8s gene, and a nearly complete sequence for ITS-2 was obtained for one red oak and compared to GenBank entries. If the hypothesis that the ITS-2 is a valuable region to study for species and genus level phylogeny, one should expect to find high homology, with slight variation within species, and among species.


Figure 1-TOP - QR1 PCR product A: no template, primers ITS3/ITS4, B: no template, actin primers, C: positive control containing both oak and HeLa DNA (100ng/µL each), D: 15ng/µL oak DNA, E: 30ng/µL oak DNA, F: 50ng/µL oak DNA. Bottom - QR2 PCR rpoduct A: no template, primers ITS3/ITS4, B: 15ng/µL oak DNA, C: 30ng/µL oak DNA, D: 100ng/µL oak DNA

Figure 2-Sequence alignments of QR2 PCR amplified product and GenBank sequences as done by ClustalW

Figure 3-Homology of QR2 PCR amplified sequence to each of the other reference oaks as defined by (A) GenBank or (B) this paper which only scored unambiguous base entries from alignments make using MacVector ClustalW alignment. GenBank did not list homologies for one species. The numbers listed are the proportion of exact base matches out of aligned bases as defined in materials and methods and expressed as percents

Figure 4

PCR products were obtained for both dried (QR1) and fresh sample (QR2) of oak. Figure 1 shows the results. Dried oak was suspected to have contamination due to coprecipitation of secondary metabolites or polysaccharides. It did not yield product at the highest concentration used (50ng/mL). The sample (100ng/mL) was also run with HeLa DNA (100ng/mL) and actin primers (10pmol each) as a positive control. It contamination was inhibiting the PCR, a band would not be detected. A band was detected at these concentrations, indicating that any inhibition due to contaminants was probably specific to the oak DNA amplification. QR2 yielded amplified DNA at all concentrations used (15-100ng/mL). The PCR product from the highest DNA template was used for sequencing.For QR2, 285 base pairs were sequenced, 186 of which belonged to the ITS-2 region. Several bases were not sequenced at the 3' end of the ITS-2 region. The remaining 99 bases were part of the 5.8s gene and were not included in analysis in this paper. Oak ITS-2 regions that the sequence of QR2 PCR amplified product was compared to were between 202 and 207 bp long. 11 of the 186 bp were ambiguous and were entered as N. Of those 11, 5 had bands that appeared equally as strong in the G or C rows. In the aligned sequences, all other oaks had either a G or a C at those positions.

An Internet Blast search using the ITS-2 sequence of QR2 PCR amplified product and a smaller partial sequence of QR1 showed high homology to Q.rubra and Q. palustris. Other oak ITS-2 sequences gave high homology, and several of them were chosen for this study (Table 1).

The red oak on GenBank is from eastern North America population (Manos et al 1999), as was QR2. 168 of the 175 unambiguous bases ("without N" method) matched between QR2 and AF098418 (3% variation). This showed the smallest variation occurring in all samples between sequences of the same species. GenBank categorized homology as 90% between these two. GenBank also showed 90% homology between QR2 and AF098416, the pin oak ("without N" method yielded 97% homology). These two species are in the same cluster and would be expected to have less variation than QR2 matched against other oaks. There was one insertion in QR2 that no other species had.

Additionally, species in the cluster with the two red oaks, the California black oak and pin oak (AF098417, AF098416 respectively) exhibited cluster wide base changes. Referring to Figure 2, alignment spots 129, 132, 200, and 202 all had bases unique to only that cluster.

This study showed that using both fresh and dried tissue samples, it is possible to extract and amplify the ITS-2 region using published primers. While this finding was not novel in itself, there were modifications to the extraction protocol and to the cycling of PCR, and product was obtained. Early PCR experiments using QR1 were not yielding detectable products (data not shown), and only when DNA template concentrations were kept low, a product was detected. Figure 1 showed an experiment with HeLa cell DNA and QR1 DNA. The finding that the HeLa PCR product was not inhibited may be as a result of at least one of the following. First, the HeLa DNA was at a high enough concentration to overcome any inhibition. Second, the inhibition of PCR with oak may have been a DNA specific problem. A contaminant that was bound to the oak template would not inhibit HeLa DNA PCR, as the results showed. This may show differences in using fresh or dried tissue, and future work will be done using more modern DNA extraction techniques to address this question.

It has also been acknowledged that there are polymorphisms in plants. The extent of polymorphism is only recently being explored. Polymorphisms are believed to occur within-individuals in transition stages of evolution when mutation rates exceed the rate of concerted evolution (Linares et al 1994), and were originally reported to be very rare for plants (Baldwin et al 1995) However, more recent studies are finding that it is more common than originally believed. Polymorhphisms have been reported in shrubs (Campbell et al 1997), conifers (Bobola et al 1992), and numerous families of plants. The finding that there were several bands on the sequencing gel that appeared strong in both the G and C lanes could have been indicative of a polymorphism. While this study can not conclusively state this, it would be interesting to study more. Since poylmorphisms are believed to occur as a result of interspecific hybridization (Soltis and Soltis 1992, Sang et al 1995), oaks may prove to be an ideal system in which to study polymorphism presence and evolution.

The phylogenetic analysis, compared each oak reference sequence only to the QR2 while not a complex and exhaustive analysis, did show that there are advantages to using the ITS-2 sequence. Table 2 showed that there was either between 85-90% or 82-97% of homology of one oak compared to QR2. Generally, there was low variability was between the most closely related oaks, with 88-90% or 89-97% homology. The methods varied in how data was scored, but cumulatively, the variability as compared to QR2 ranged between 3% and 28%. In studies of other higher plants, variability has been shown to range from complete identity to 25.8% (Aceto et al 1999). This is consistent with the results of this study. Complete identity was not expected because there was only one comparison of sequences from oaks of the same species (the red oaks). A larger sample size may have yielded that result, but accessible sequence information was limited. Also, the exact location of AF098418 is not known beyond a broad geographical zone (eastern North America). It could be that distinct genetic populations exist at some scale and due to the lack of knowledge as to the distance geographically between the two oaks, this question could not be teased out further. Future research is being done to address this. Q. rubra from different geographical populations will to sequenced to see if variations are consistent along fixed geographical distances. Again, since variation is to be expected, it is predicted that there will be differences within a species. Information as to base changes, insertions, or deletions in a geographical region may help to distinguish genetically different populations and give insight into the evolution of oaks. A larger sample size will also facilitate resolving the unanswered question of whether base changes are limited to specific sites, with others necessarily conserved.

The strength of DNA sequence analysis of the ITS-2 region is shown even in this small scale phylogenetic analysis when looking at known closely related species. The previously discussed cluster showed that there were base changes at sites that were unique only to that cluster. While thorough analysis of all species was only done in comparison to QR2, one can see from the alignments in Figure 2 that AF098455 and AF174636 have base changes at the same sites, as well. Not surprisingly, these species share a common cluster (Manos et al 1995). Therefore, certain sites may be assumed to have evolutionary closeness and may be informative in phylogenetic analyses (i.e. bootstrap analysis) which seeks to group closely related species together based on differences such as this. With the development of DNA phylogenetic analysis, there have been changes in how species are viewed to fit together.

It has been shown that there is variation, albeit it small between sequences of ITS-2 of individuals of the same species. It has also been shown that there is homology of known oak sequences with the sequence of QR2 as determined in this study, with variations of the sequences ranging between 3 and 28%. Additionally, this study agreed with other studies that classify particular species into subgenus clusters. This was done by showing the highest homology of QR2 with known closely related species. The cluster data also suggest that there are base changes that will be seen in closely related species, and that this may be used to infer phylogenetic relationships.


I would like to thank Sabrina Volpi and Rocco Coli, my TA's for their patience and flexibility, and Dr. Berish Rubin for use of his laborarory. I thank Dr. Jim Lewis for my samples, and a review of this paper. I finally thank Dr. Michael Risley for making his computer and scanner accessible to me, and for his continuing advice and support.

This document was last modified 01/31/2006.
This site is powered by the versatile Zope platform.
This is a project of the Biology Department of Fordham University Home