All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Genome-Wide Selection and Testing of Superior Reference Genes for Quantitative Gene Expression Normalization in Tobacco (Nicotiana tabacum)

Yushuang Guo1#, Ruiyuan Li2#, Zeshan Asgher3, Imran Haider Shamsi3*, Shizhou Yu1, Jiehong Zhao1,and Xueliang Ren1*

1Key Laboratory of Molecular Genetics, China National Tobacco Corporation, Guizhou Institute of Tobacco Science, No.29, Longtanba Road, Guanshanhu District, Guiyang City, Guizhou Province, 550083, P.R. China

2Key Laboratory of Information and Computing Science of Guizhou Province, Guizhou Normal University, No.116, Baoshanbei Road, Guiyang City, Guizhou Province, 550083, P.R. China

3Key Laboratory of Crop Germplasm Resource, Department of Agronomy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, P.R China

#These two authors contributed equally to this paper.

*Corresponding Author:
Imran Haider Shamsi
Key Laboratory of Crop Germplasm Resource
Department of Agronomy, College of Agriculture and Biotechnology
Zhejiang University, Hangzhou
310058, P.R China
E-mail: [email protected]
Xueliang Ren
Key Laboratory of Molecular
Genetics, China National Tobacco Corporation
Guizhou Institute of Tobacco Science, No.29
Longtanba Road, Guanshanhu District, Guiyang
550083, P.R. China
E-mail: [email protected]

Received date: July 27, 2016; Accepted date: September 06, 2016; Published date: September 12, 2016

Visit for more related articles at Research & Reviews: Journal of Microbiology and Biotechnology


qRT-PCR is a very accurate tool for the measurement of gene expression in all organisms. Accurate and reproducible qRT-PCR results depend on the normalization of expression data using reliable reference genes. To diversify the pool of reliable reference genes in tobacco, the expression stability of 44,873 tobacco genes (ESTs) were measured using a custom-designed microarray in a diverse set of 21 samples, representing various tissue types and developmental stages. A total of 14 genes were selected as candidate reference genes because they appeared to be the most stable. These genes were compared with two widely used housekeeping genes (L25 and EF-1a) for further validation by qRT-PCR. The results showed that Gene 3 (NADH dehydrogenase) and Gene 4 (chaperonin CPN60-2) exhibited the highest expression stability and that they were even more stable than L25 and EF-1a. Taken together, our results provide a wider pool of stable reference genes for gene expression analysis of tobacco.


Internal reference gene, Microarray, Normalization, Real-time RT-PCR, Tobacco.


qRT-PCR is a standard and accurate method to study gene expression and to perform other in-depth analyses in biological research [1]. The transcripts of stably expressed genes act as crucial internal references for the normalization of gene expression data, because results are affected by a variety of experimental conditions such as the amount of sample, primer performance and cDNA synthesis efficiency [2,3]. Typically, genes encoding transcripts involved in basic cellular processes are used as internal references. Unfortunately, transcript levels of these genes are not always stable and significant variation in expression is observed for these commonly used reference genes [4,5]. Ideal reference genes are constitutively expressed, are present in different cell or tissue types at different developmental stages and remain stable under varying experimental conditions [6-8]. Thus, to determine which reference gene is best suited for transcript normalization is critical for accurate data analysis and conclusions [9]. Recently, studies validating the importance of reference genes have been performed on plants, including Arabidopsis thaliana [10], potato [4], soybean [11], grape [12], tomato [13], rice [14,15], tobacco [16], sugarcane [17], cucumber [18], litchi [19], perennial ryegrass [20], peanut [21], flax [22], zucchini [23] and wheat [24]. In this regard, many statistical analysis software packages, such as geNormPLUS [8], BestKeeper [25] and NormFinder [26] have been conceived to aid researchers in validating the suitability of a reference gene in a range of samples.

Tobacco (Nicotiana tabacum) is an established model plant for physiological and genetic studies due to its economic importance and easy molecular manipulation [27-30]. Thus, it is very important to expand the currently available tools for studying gene expression in tobacco. Previously, a systematic assessment of the stability of eight potential qRT-PCR reference genes in tobacco was performed, which identified three reference genes (L25, EF-1a and Ntubc2) that are sufficient for normalization in several organ types and in the presence of a number of abiotic stresses or circadian oscillations [16]. However, the relative stability of these and other potential reference genes in tobacco has not been studied at the whole-genome level or in an experimental context, which has limited the use of qRT-PCR for the study of this crop.

In the present study, the stability of the expression of 44,873 tobacco genes (ESTs) was measured using a custom-designed microarray in a diverse set of 21 tissue samples, representing a range of tissue types and developmental stages under normal growth conditions. The most stable genes were selected as novel candidate reference genes and were validated by qRT-PCR using two widely used housekeeping genes (L25 and EF-1a). The stability of the expression of the candidate reference genes was examined by Ct analysis and geNorm statistical software. These results provide more stable reference genes to enable the accurate and widespread use of qRT-PCR in tobacco.

Materials and Methods

Plant Material

Tobacco cultivar variety K326 was grown in guiding country in Guizhou Province, P.R. China, under normal conditions for tobacco production. The leaf, stem, root and flower from 5 individual plants were mixed for each sample and were collected as follows: the leaves, stems and roots were harvested at 0, 20, 40, 60, 70 and 100 days after transplantation, and the flowers (F) were harvested at 0, 5 and 10 days after anthesis for a total of 21 samples. All samples were stored at -80°C.

RNA Extraction and Microarray Assay

Total RNA were extracted using plant RNA purification reagent (Qiagen) and treated with DNase I (Takara) to remove any DNA contamination according to the manufacturer’s protocol. Double-stranded cDNA were synthesized using SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen) with oligo (dT) primers following the manufacturer's protocol. Cy-3 labeling and hybridization steps were according to standard procedures.

A custom-designed microarray platform was used, consisting of three 60-mer probes designed against 44,873 unigenes derived from a public EST tobacco database. Each 60-mer probe was chosen from a group of six to seven non-overlapping probes designed against different parts of the gene model. The probes with the most similar values to the average of the six to seven non-overlapping experimental probes was assumed to be the most reliable for transcript-level estimations. Roche NimbleGen generated the expression values using quartile normalization and the robust multichip average algorithm.

Quantitative Reverse Transcription Real-Time PCR Analysis

Fourteen potential reference genes that were more stably expressed than the L25 ribosomal protein-encoding gene and EF-1a were analyzed by qRT-PCR. Primers were designed manually using Primer 5.0 software (Applied Biosystems, CA, USA) according to the sequences of fourteen potential tobacco reference genes and the EF-1a and L25 ribosomal protein-encoding gene sequences were obtained from GenBank (Table 1). The specificity of the amplification product for each primer pair was verified by the presence of a single band of the expected size using agarose gel electrophoresis and by the presence of a single peak in the qRT-PCR melting curve of the products.

Gene no. Accession No. Gene product Amplicon size PCR efficiency( ± S.D) Primer sequence 5'→3'
Gene 1 gi|158189353 Zinc finger protein 239 bp 1.970 ± 0.014 F:5’-TTAGGCATCCTTTCCCTCCT-3’ R:5’-CAATACTGCAGCCATAGTGC-3’
Gene 2 XM_009759509 F-box protein 142 bp 2.012 ± 0.012 F:5’-GTCTCAATAACCGGAAGAGC-3’
Gene 3 Y09109 NADH dehydrogenase 228 bp 1.976 ± 0.017 F: 5’-TTGGTGGATCTGACCTAGTG-3’
Gene 4 XM_009760119 Molecular chaperonin CPN60-2 190 bp 2.010 ± 0.014 F:5’-AGCACTAGGCATTGCCATTG-3’
Gene 5 gi|192177446 ATPase protein 203 bp 1.987 ± 0.021 F:5’-TCTCAGGACACGCCAATATC-3’
Gene 6 XM_009612124 Zinc finger protein 165 bp 1.905 ± 0.013 F:5’-GAGGAGGCGCAACATTACAG-3’
Gene 7 gi|191971597 PKHD-type hydroxylase 177 bp 1.876 ± 0.011 F:5’-TCCGGATGGACTAGCAGAAG-3’
Gene 8 gi|192049848 Type III secretion component protein 263 bp 1.967 ± 0.014 F:5’-GATATGAGCCAACTCGACTG-3’
Gene 9 gi|191729256 Glucose dehydrogenase 210 bp 2.011 ± 0.015 F:5’-CCTGACTCTAGGTGGCATGT-3’
Gene 10 gi|192222232 RNA polymerase sigma-24 subunit 285 bp 2.003 ± 0.013 F:5’-GCTTGAGTAGCCTTTGAGAG-3’
Gene 11 gi|192221411 Hypothetical protein 161 bp 1.992 ± 0.014 F:5’-TTGAGGCATTGGAGAGTGAG-3’
Gene 12 gi|191769435 Kinesin-like protein KIFC3-like 263 bp 2.013 ± 0.011 F:5’-CACCTATAGTCCGACTTCTG-3’
Gene 13 gi|191822098 Diguanylatecyclase 207 bp 1.955 ± 0.010 F:5’-GTGATACTGGTGGTGGTCAT-3’
Gene 14 gi|191898125 NAD-dependent dehydratase 215 bp 1.999 ± 0.014 F:5’-TCAACCGCCACCTCTTCAAT-3’
Gene 15 AF120093 EF-1a 135 bp 2.002 ± 0.009 F:5’-CAAGCGGTCATTCAAGTATGC-3’
Gene 16 L18908 L25 ribosomal protein 140 bp 1.998 ± 0.012 F:5’-TTGAGGACAACAACACCCTTG-3’

Table 1: Description of candidate reference genes and optimized primer sequences used for qRT-PCR.

First strand cDNA were synthesized by the MMLV-reverse transcriptase (Promega) using oligo (dT) primer with 0.1 μg total RNA as the template according to the manufacturer’s instructions. qRT-PCR was performed using a SYBR Real-time PCR Detection System (MJ Research, Waltham, MA, USA) following the manufacturer’s instructions. Each reaction was prepared in a total volume of 20 μl containing 10 μl SYBR Green Mix (Takara), 1.5 μl of diluted cDNA (corresponding to 1.5 ng of reverse transcribed total RNA) and 0.2 μl of each primer (200 nM working concentration). Reactions were subjected to an initial denaturation step of 95°C for 10 s, followed by 35 cycles of 95°C for 5 s, 60°C for 30 s and 72°C for 10 s. Replicates were assessed for all reference genes, and each sample was prepared in triplicate.

Data Analysis

The Ct value of each reference gene was used to compare expression levels of all 21 samples. PCR efficiency was determined from amplification plots using LinReg PCR software [16]. The results were imported into Excel, and geNorm was used to determine the effectiveness of normalization of each reference gene in all 21 samples. geNorm software provides a stability measurement (M) and excludes the least stable gene using a step-wise method and creating a line graph to demonstrate the stability of the selected gene [12].


Identification of the Most Stable Tobacco Genes from Microarray Data

The coefficients of variation (CVs) of the normalized gene expression values of the 21 samples (GEO Accession No: GSE73440) were used to assess the stability of gene expression. The CVs of the 44,873 selected genes ranged from 0.0345 to 4.3080 with an average of 0.2665. The frequency distribution of the CVs of the expression values of the 44,873 genes included 2 peaks, indicating a mixed distribution of gene expression stability (Figure 1). In addition to the CVs, the range of expression of each gene in the 21 samples was used to assess the stability of gene expression. The smallest range of expression in the 21 samples was 0.045, and the largest was 250.788, with an average of 1.580. By combining the smallest CV and the range value for each gene, a pool of 14 candidate genes that were determined to be more stable than L25 and EF-1a were selected for further qRT-PCR validation. The average expression range and CVs of the selected genes were 1.010, 0.042 and 0.2062 respectively (Table 2).

Gene No. Standard variation max min Range average CV
Gene 1 0.035 1.075 0.926 0.1486 1.000 0.035
Gene 2 0.038 1.091 0.921 0.1704 0.989 0.039
Gene 3 0.041 1.106 0.934 0.1720 1.017 0.040
Gene 4 0.041 1.135 0.916 0.2186 1.024 0.040
Gene 5 0.039 1.049 0.867 0.1820 0.943 0.041
Gene 6 0.041 1.070 0.845 0.2255 0.981 0.041
Gene 7 0.043 1.133 0.893 0.2391 1.005 0.043
Gene 8 0.050 1.276 1.046 0.2297 1.145 0.043
Gene 9 0.043 1.115 0.876 0.2391 0.990 0.043
Gene 10 0.043 1.101 0.903 0.1980 0.991 0.043
Gene 11 0.044 1.102 0.898 0.2039 1.014 0.043
Gene 12 0.044 1.154 0.907 0.2465 1.007 0.043
Gene 13 0.042 1.061 0.894 0.1669 0.978 0.043
Gene 14 0.044 1.126 0.912 0.2137 1.021 0.043

Table 2: The Standard Variation, Maximum, Minimum, Range, Average and CV (Coefficient of Variation) of the normalized expression value for the selected 14 genes.


Figure 1: The frequency distribution of the CVs for the expression values of the 44,873 genes. Shown is a histogram of log2-transformation of CVs (Coefficient of Variation) for the expression values of the 44,873genes in 21 samples. The bin width of the histogram was set to 0.02 (x-axis).

Primer Selection and Specificity

After the identification of stably expressed genes from the microarray data, we aimed to compare the widely used reference genes with new reference genes. To select the best stable reference genes for further gene expression assessments, qRT-PCR primers were designed for the 14 stable genes selected as new candidate reference genes (Table 1). Amplification specificity determinants, including the presence of a single peak in the qRT-PCR melting curve of the products (Figure 2) indicated that there were no other non-specific amplification products. Additionally, no qRT-PCR detection signals were observed in reactions without template (data not shown).


Figure 2: Primers specificity and melt curves analysis generated for amplicons of each of the sixteen reference genes. The dissociation curves of fourteen potential reference genes and EF-1a and L25 constructed using three technical replicates of each of the 21 tobacco cDNA samples.

Ct Analysis

To assess gene stability, all PCR assays were conducted in triplicate, and the mean was used for data analysis. In present study, the Ct values of the 16 reference genes ranged widely, from 19.62 for Gene 16 to 33.22 for Gene 1 (Figure 3). The standard deviations ranged from 0.43 for Gene 4 to 3.9 for Gene 7, indicating a high level of expression variation for all reference genes, and no single gene was stably and constantly expressed under all experimental conditions. Gene 4 showed the most stable expression among the 16 tested reference genes, indicating that it was a top-ranking reference gene in tobacco. Although the expression stability of Gene 12 was lower than those of Gene 4 and Gene 16, but it was greater than that of Gene 15, suggesting that it is a suitable reference gene for data normalization.


Figure 3: Absolute cycle threshold values (Ct) for sixteen reference genes over 21cDNA samples in qRT-PCR.

geNorm Analysis

geNorm analysis ranks reference genes according to expression stability (M value), facilitating the selection of an optimal pair of reference genes from a group of candidate genes in different treatment samples under varying conditions. The M value is for a reference gene as the average pairwise variation V for that gene with all other tested reference genes. Stepwise exclusion of the gene with the highest M value allows ranking of the tested genes according to their expression stability. The gene with the lowest M value is thought to be the most stably expressed. When all 21 samples were considered collectively, Genes 3 and 6 were determined to possess the most stable expression, while Gene 7 was least stably expressed under all experimental conditions in different tissues and developmental stages. Gene 3, Gene 6 and Gene 14 were more stably expressed compared with the two traditional reference genes, Gene 15 and Gene 16 (Figure 4a). In addition, Gene 4 and Gene 5 also displayed more stable expression than Gene 15, indicating the potential use of Gene 3, Gene 4, Gene 5, Gene 6 and Gene 14 as novel reference genes for transcript normalization. We also analyzed the tissue specificity of all selected candidate reference genes. Genes 4 and 14 were stably expressed in roots, with Gene 8 being the least stable (Figure 4b). Gene 6 and Gene 14 were stably expressed in stems, while Gene 10 was the least stable (Figure 4c). Gene 5 and Gene 16 were stably expressed in leaves, while Gene 10 appeared to be the least stable, similar to its performance in stem tissues (Figure 4d). In flowers, Gene 3 and Gene 6 were stably expressed (Figure 4e). Taken together, we recommend Gene 3 and Gene 4 as new reference genes because they exhibited the most stable expression in different tissues compared with the traditional reference genes, Genes 15 and 16.


Figure 4: Selection of the most suitable reference genes for qRT-PCR normalization. Data are from three biological replicates. The average gene expression stability value (M value) used for ranking the internal control genes in the different groups. Calculations were performed using geNorm and are presented as Total (a), Root (b), Stem (c), Leaf (d) and Flowers (e).


qRT-PCR is an important technique for measuring gene expression levels due to its accuracy, specificity and sensitivity. However, the accuracy of target gene expression analysis is strongly influenced by the choice of the internal reference gene for data normalization; thus, it is very important to screen appropriate internal reference genes to ensure proper expression normalization [31,32]. Typically, traditional housekeeping genes are used as endogenous references for relative quantification. However, most traditional housekeeping genes are unable to provide accurate normalization in qRT-PCR analysis because a single universal reference gene has not been identified to date [33]. Therefore, novel methods for discovering and testing new reference genes must be developed based on the widely available gene chip expression data, publicly deposited EST data, RNAseq data, and new reference genes with superior stability must be selected and verified with qRT-PCR [34].

Tobacco belongs to the agriculturally important Solanaceae family and is both a valuable industrial and commercial crop in many countries. It is an important model crop for gene expression studies; however, the application of qRT-PCR in the study of this crop has been limited due to a lack of information with regard to reference gene stability in a variety of experimental contexts. In the present study, to discover new and more stable reference genes for gene expression normalization, the expression stability of all tobacco genes (ESTs) was measured using a custom-designed microarray in a diverse set of 21 samples, representing a range of tissue types and developmental stages. The 14 most stable genes were selected as novel candidate reference genes and were validated by real-time PCR using two widely used housekeeping genes (L25 and EF-1a). Ct and geNorm analysis were performed based on the real-time PCR data. In geNorm analysis, we found that Gene 15 seems to be a pretty good candidate, since it's the second most stable gene in total tissues. However, in respective tissues (root, stem, leaf and flower), Gene 15 seems to be not very stable compared with other candidate. The reasons to explain this phenomenon is that in fact, the M value of Gene 15 was almost unchanged in total tissues or in respective tissues (root, stem, leaf and flower), But the M value of other candidate changed in specific tissue resulting in the ranking changes. This results confirmed that Gene 15 is a pretty good candidate, while other candidate also have potential use for quantitative gene expression normalization in specfic tissue. Similar results were also found in other plants like peach [34], maize [35] and Hedera helix [36]. Taken together, we recommend Gene 3 (NADH dehydrogenase) and Gene 4 (chaperonin CPN60-2) were sufficient for the normalization of gene expression in developmentally distinct tobacco tissues. Previous study showed that chaperonin CPN60-2 is responsible for mitochondrial protein import and macromolecular assembly. It may also prevent misfolding and promote the refolding and proper assembly of unfolded polypeptides generated under stress conditions in the mitochondrial matrix [37].


This is the first report to use a custom-designed tobacco whole-genome microarray for investigating the expression stability of traditional reference genes and to identify novel tobacco reference genes that outperform the traditional ones in terms of expression stability, as determined by their Ct values. Our results confirmed that many genes, including Gene 3 (NADH dehydrogenase) and Gene 4 (chaperonin CPN60-2), are more stably expressed and are present at lower levels compared with the traditional reference genes, providing a foundation for the more accurate and widespread use of qRT-PCR in tobacco.


The authors gratefully acknowledge the financial support of 973 national project (517102-N51501/001), the Youth talent training plan of Guizhou province (QKHRZ[2013]02), Tobacco Company of the China National Tobacco Corporation (CNTC- 110201201006, CNTC-110201301004) and Key scientific and technological program in agriculture of Guizhou Province (NY[2013]3020) and Natural science foundation of Guizhou province ([2014]2117). The authors would also like to acknowledge and thank Zhejiang University’s Special funds for basic scientific research (517102*172210152), Elite undergraduate students training program (188190*170115101/013/002/001) and Jiangsu Collaborative Innovation Center for Modern Crop Production regarding their financial assistance.


jf factory audemars piguet