Emerging SARS-CoV-2 Genetic Variations and Mutations in the  COVID-19 Genomic Sequence: A Systematic and Meta-Analysis  Review

Tahoora Mousavi1; Monireh Golpour1; Reza Valadan1; 2; Reza Alizadeh Navaei3; Mehryar Zargari4; Mehrdad Gholami5; Mohammadreza Haghshenas6

Emerging SARS-CoV-2 Genetic Variations and Mutations in the COVID-19 Genomic Sequence: A Systematic and Meta-Analysis Review

Tahoora Mousavi^1*, Monireh Golpour¹, Reza Valadan^1,2, Reza Alizadeh Navaei³, Mehryar Zargari⁴, Mehrdad Gholami⁵, Mohammadreza Haghshenas⁶

¹Department of Molecular and Cell Biology, Mazandaran University of Medical Sciences,Sari, Iran

²Department of Immunology, Mazandaran University of Medical Sciences, Sari, Iran

³Department of Gastro Intestinal Cancer, Mazandaran University of Medical Sciences, Sari, Iran

⁴Department of Biochemistry, Genetic, Molecular and Cell Biology, Mazandaran University of Medical Sciences, Sari, Iran

⁵Department of Microbiology and Virology, Mazandaran University of Medical Sciences, Sari, Iran

⁶Department of Microbiology, Molecular and Cell Biology, Mazandaran University of Medical Sciences, Sari, Iran

*Corresponding Author:: Tahoora Mousavi
Department of Molecular and Cell Biology,
Mazandaran University of Medical Sciences,
Sari,
Iran
E-mail: T.mousavi@mazums.ac.ir

Received: 21-Jan-2023, Manuscript No. JOB-23-87643; Editor assigned: 24-Jan-2023, PreQC No. JOB-23-87643 (PQ); Reviewed: 07-Feb-2023, QC No. JOB-23-87643; Revised: 21-Mar-2023, Manuscript No. JOB-23-87643; Published: 30-Mar-2023, DOI: 10.4172/2322-0066.11.1.006

Visit for more related articles at Research & Reviews: Research Journal of Biology

Abstract

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is the causative agent of Coronavirus disease 2019 (COVID-19). The high mutation rate of RNA viruses causes genetic variation, virus evolution and it is a strategy to escape the immune system. In the present study, all researches and evidence were extracted from the available online national databases. Two researchers randomly evaluated the assessment of the research sensitivity. Finally, after quality assessment and specific inclusion and exclusion criteria, the eligible articles were entered for meta-analysis. The heterogeneity between the results of studies was measured using test statistic (Cochran's Q) and I2 index. The forest plots illustrated the point and pooled estimates with 95% confidence intervals (crossed lines). All statistical analyses were performed using comprehensive meta-analysis V.2 software. This meta-analysis included 13 primary studies investigating the SARS-CoV-2 genetic variations and mutations in the COVID-19 genomic sequence. According to the pooled prevalence (95% confidence interval) of mutations, the spike gene variations showed the highest non-synonymous mutation frequency (16.4%, CI: 13.6, 16.6) and the Non-Structural Protein (NSP) genes possess the highest mutation frequency among total mutations (31.6%, CI: 21, 44.6). Genomic mutation analysis of SARS-CoV-2 strains may provide knowledge about different biological infrequent mutations and their relationships of viral transmission, pathogenicity, infectivity, and fatality rates between SARS-CoV-2 and human cells.

Keywords

Genetic Variation; SARS-CoV-2; Mutation; COVID-19 sequences

Introduction

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) the causative agent of coronavirus disease 2019 (COVID-19), poses a foremost challenge to public health. Since the primary appearance of SARS-CoV-2 in late December 2019 in Wuhan, Hubei province, central China, a high dissemination rate has been observed worldwide [1]. Based on information released from the World Health Organization (WHO) on 29 December 2020, the present pandemic COVID-19 has nearly 79 million confirmed cases worldwide and over 1.7 million. The SARS-CoV-2 is classified in the family of Coronaviridae, the order of Nidovirales and the genus Betacoronavirus. Similar to other coronaviruses, the genome of SARS-CoV-2 consists of specific genes encoding some structural/non-structural proteins. Mutation level among RNA viruses is notably high, which this phenomenon is essential for viral adaptation [2]. Though, coronaviruses have been introduced to have proofreading systems and so, nucleotide sequence variety in SARS-CoV-2 has been observed at a very low level. In a study reported the presence of 13 variations site in Open Reading Frames (ORF) of SARS-CoV-2 in 1a, 1b, S, 3a, M, 8 and N regions, which among them positions nt28144 and nt8782 in ORF 8 and ORF 1a indicated mutation rate of 30.53% and 29.47%, respectively [3]. In addition, based on the evidence obtained from a study on 48,635 SARS-CoV-2 sequences, 353, 341 mutations have been detected throughout the world. Among them, D614G mutation in C-terminal of the spike protein (aspartate to glycine substitution at position 614) is one such evolutionary alteration detected in the SARS-CoV-2 and has become the most common type reported in many regions of the world such as Europe, Oceania, South America and Africa. The present study was aims to assess the prevalence of SARS-CoV-2 genetic variation and mutation in COVID-19 sequences [4].

LITERATURE REVIEW

Genetic diversity and mutations of the COVID-19

There are several reports of unusual public health due to variants of SARS-CoV-2, which changes in transmissibility, clinical features, and severity. Shows the list of significant mutations in the world (Table 1).

Name of variant or mutation	Time of mutation	Area	Location	Out comes
D614G	Early February 2020	China	Spike protein	D614G have indicates greater transmissibility in humans rather than greater pathogenicity. D614G produce higher viral loads. D614G variant more susceptible to neutralizing antibodies and does not causes serious disease or alter the efficacy of vaccines.
SARS-CoV-2 VOC 202012/01 or B.1.1.7 or 20B/501Y.V1	December 2020	UK and 31 other countries	RBD	Increased transmissibility No change in disease severity. No evidence that this variant has any impact on the vaccine. Mutation of N501Y is detected in B.1.1.7.
B.1.351 or 501Y.V2	18 December, 2020 30 December, 2020	South Africa other countries	RBD	Higher viral load Increased transmissibility No associated with more severe disease or worse outcomes. K417N mutation effect on monoclonal and poly clonal antibody. Mutation of N501Y, E484K and K417N are detected in B.1.351. E484K makes the vaccine less effective against it.
B.1.1.248 (P.1) or 501Y.V3	January 2021	Tokyo and 3 other countries (Brazilian)	RBD	This variant contain N501Y (More transmission) and E484K (Escape of antibody) and K417N. Effective on the production of antibody, vaccination or virus neutralization.
N439K	March 2020 October 2020	Wuhan, Europe 12 countries	RBD	Escape from immune system. SARS-CoV-2 Bind to Human ACE2 more strongly than original strain. N439K escape from polyclonal and neutralizing antibody responses.
A.EU1	June 2020	Spain, UK and 12 countries	spike protein	It is not clear for increasing of the transmissibility of the virus. Mutation of A222V and A220V are detected in A.EU1. Spike mutations A222V had a functional effect on spike’s ability to mediate cell entry. Less effective against vaccine.
Cluster 5	August and September 2020	Denmark	spike protein	Decrease the duration of immune protection following natural infection or vaccination Cluster 5 variant identified only in 12 human cases and this variant dose not spread widely Might effect on vaccine development.

RBD: Receptor Binding Domain

Table 1. The list of significant mutations in the world.

METHODOLOGY

Search strategy

In the present study, the search strategy was done using available online national databases, including ISI, Science direct, Scopus, Pubmed, Wiley and Google scholar between December 2019 and March 2021. The search was performed based on appropriate keywords of SARS-CoV-2, variation, mutation and COVID-19 sequences, which were combined with and/or/not to determine and screen articles in the search strategy [5]. Besides, it is investigated the references of the published studies to improve the sensitivity of the search. The assessment of the research was randomly evaluated by two researchers and confirmed that all suitable studies had been detected [6,7].

Study selection

At first, articles of all researches, evidence or reports were extracted from the electronic database. After examinations of studies, duplicate articles were identified and removed from the study. Then, after analyzing the articles, the irrelevant articles were excluded by reviewing of title, abstract, and full text. Also, articles screened for eligibility and review articles and articles published in other languages were extracted from this study.

Quality assessment

The PRISMA checklist was used for evaluation of the quality of the related studies and determination of the selected studies based on title and contents [8]. The PRISMA checklist consists of 27 items covering different aspects of research methodology such as determining Protocol and registration, eligibility criteria, search, study selection, defining variables, method of data collection, risk of bias in individual studies, presentation of results and statistical tests. Each question was required one score [9].

Inclusion/Exclusion criteria

All articles approved by the above assessment phases were considered eligible for final meta-analysis:

•All English studies.

•Studies based on the prevalence of SARS-CoV-2 genetic variation among total mutation.

•Reported prevalence of SARS-CoV-2 genetic variation among non-synonymous mutation.

The following studies were ruled out:

•Duplicated studies.

•Non-relevant articles.

•Article with non-full length sequence.

•Abstracts, letters or review studies.

•Studies published in languages other than English.

•Articles with no access to the full text.

Data extraction

After selection of appropriate articles, the following data for each research were extracted based on first author’s name, geographical regions, publication year, language, the number of total mutations, non-synonym mutations, mutation in S-protein, mutation in N protein, mutation in M protein, mutation in E protein, ORF 1a/1b, ORF 3a, ORF 7a, ORF 7b, ORF8a, ORF 10a, ORF6, ORF 1a and NSP. The data were extracted and entered into a Microsoft Excel spread sheet [10].

Statistical analysis

The primary outcome was the SARS-CoV-2 genetic variation and mutation in COVID-19 sequences. In our research, the heterogeneity between the results of studies was measured using the test statistic (Cochran's Q) and the I2 index. P-value less than 0.1 were used to consider significant heterogeneity. The forest plots illustrated the point and pooled estimates with 95% confidence intervals (crossed lines). Each box in a forest plot indicated the study's weight [11]. The heterogeneity and homogeneity of the suspected factors were performed using random and fixed effects models, respectively and more than 50% were considered as high degrees of heterogeneity. All statistical analyses were performed using comprehensive meta analysis V.2 software [12].

RESULTS AND DISCUSSION

In the present study, 1370 articles were identified in the starting process. The number of studies was reduced to 1209 following the removal of duplicate articles. In the next step, 890 irrelevant documents were removed after reviewing the full texts [13]. Then, 319 articles were considered for further screening. After the exclusion of 291 articles, 28 articles were assessed for eligibility. 6 articles with non-full length sequence, 8 review articles and one article with other languages were excluded. Finally, 13 relevant articles were included in the meta-analysis review (Figure 1). In addition, the geographic distribution and frequently mutated residues among COVID-19 sequences are shown in Table 2 and Figure 2 respectively.
In addition, the geographic distribution and frequently mutated residues among COVID-19 sequences are shown in and respectively (Figures 1 and 2 and Tables 2-4).

Figure 1: Flow chart of the literature search strategy for selection and including primary articles.

Figure 2: Frequently mutated residues among COVID-19 sequences from different locations.

Strain name	Location	Mutation position
hCoV-19/Singapore	ORF 1ab	C8517T, T17459C (V5820A) T2449C (F817L) C176A (A59D) C595T (P199S)
	S
	ORF3a
	N
	ORF8
CGMH-CGU-02 (Taiwan)	ORF1ab S	C8517T, A16577G (K5526R) C145T (H49Y), C2651T (S884F)
	S
	ORF8
hCoV-19/Los Angeles	ORF1ab	F924F, P4715L D614G
hCoV-19/Los Angeles	S	F924F, P4715L D614G
hCoV-19/ Asia, Oceania, Europe, North America	ORF1ab	(1397 nsp2, 2891 nsp3, 14408 RdRp, 17746 and 17857 nsp143, 18060 nsp14), (23403, spike protein) (28881, nucleocapsid phosphoprotein) (nt 26143) (nt 28144)
	S
	ORF9a
	ORF3a
	ORF8a
NCBI	ORF1ab	P4715L, L3606F D614G R203K/G204R, P13L, 203K/204R
	S
	N
hCoV-19/GISAID	S	D614G, L5F, L8V/W, H49Y, Y145H/del, Q239K, V367F, G476S, V483A, V615I/F, A831V, D839Y/N/E, P1263L,
hCoV-19/Singapore	ORF7b	382-nt deletion
hCoV-19/Singapore	ORF8	382-nt deletion
hCoV-19/Australia	ORF7b	138-nt deletion
hCoV-19/Australia	ORF8	138-nt deletion
hCoV-19/Bangladesh	ORF8	345-nt deletion
hCoV-19/Spain	ORF8	62-nt deletion
hCoV-19/Italy	ORF 1ab	(S443F, H3076Y, L3606F, P4715L, E5689D, R5919K) (D3G) (G70C) (A570D, D614G, G1046V) (G251V) (R203K-G204R, V246I)
	M
	ORF 7a
	S
	ORF 3a
	N
GISAID database	ORF 1ab, ORF 1a,ORF8	(nsp12, nsp13, RdRp) (nsp2, nsp6)
GISAID database	S, ORF 3a, N	(nsp12, nsp13, RdRp) (nsp2, nsp6)
Bangladesh	S	I300F (nsp2), P4715L (nsp12), D614G R203K, G204R (N protein)
Bangladesh	N
Indian states	2'-O-ribose methyltransferase	N298L V871I, A88V, P314L P1103L, S1285F, S1197R, A994D, T1198K D279N, L37F. A380V, G339S, Q496P, S202N T372I , L177F L46F, Q57H L84S L54F, D614G P13L, S194L, RG203KR
	RNA-dependent RNA polymerase
	Predicted phosphoesterase, papain-like proteinase
	Transmembrane protein
	NSP
	3'-to-5' exonuclease
	ORF3a
	ORF8
	S
	NP
South American	ORF 1b	D614G, E1207V G392D, T708I, I739V, P765S, A876T, A1043V, N2894D, F3071Y, G3334S, L3606F Q57H, G196V, G251V T175M L84S D103Y, R191C, S197L, R203K , G204R, G238C
	ORF 1a
	ORF 3a
	M
	ORF 8
	N
GISAID database	3’UTR	G204R-S194L, R203K, S202N L84S-Q57H D614G, A879S A1812D (nsp3), L3606F (nsp6), P4715L (RdRp)
	5’UTR
	N
	ORF 8
	M
	ORF 3
	S
	ORF1ab
Northern Vietnam	S	L54F, S254F, C1250F, D614G Q57H, G251V D3G, V70F S81L, L96F, L102_I103del R203K; G204R, S180I, A211V, Q283H Gly82_Val86del, Met85del (nsp1), T85I, G212D, 559V, P585S (nsp2) A58T, T428I, R646W, L672F, G730D, P1103L, K1186R, M1901I (nsp3), D477N (nsp4), G15S (nsp5), L37F (nsp6), D161V, P323L, V338F (nsp12) R595S(nsp13) V320L(nsp15) P134S, T140I (nsp16)
	ORF 3a
	M
	ORF 7a
	ORF 7b
	N
South-East Asia	S	D614G R203K, G204R, P13L, Q57H, NS8_L84S, L37F, P323L A97V, T1198K
	N
	NSP
Morocco	S	D614G
Saudi Arabia	S	P97 L, T424I, C1313S, W553R, S950T, R700L, S191 P, S459T, V26 L, Q1009L S733R/E736 K, F1609 L, P1883S, M4574I, V51I, T649I, P777S, A1045 V, V1202I, E1835A, M2119I, P2742S,G3117E/V3120D, F5011I, R5027S/D5028Y/R5029 K, G5061R, H243C, H73N, 167 T127I W293C, A300V, V178A, S11 F, L283 F, G28 V, D242E, V263A, P7L, G198S, R292 P, S391I, G, L3785F, A5362 L, V117D, D2204A, V5551A, E102C, T5560S, V6030 F, T636I, G981S, V1375I, A1462 T, F3098 L/T3101N, F5072 L, I5559 M/T5560 F/G5561W/L5562 V/Y5563 V, I6668 M, V6431F, G85D/P86 F/T87N, K1255R, H1714Y, N4385 K, I4611 V, A5764S, M6272I, L6373R, L6958 P, V62F, G85D/P86F, A218S, T586S, P1099R, E1428 G, A1301 V, P1971S, S2111R, K5019 N, L6412 F, P86L, I147L T127I W293C, A300V, V178A, S11 F, L283 F, G28 V, D242E, V263A, P7L, G198S, R292 P, S391I,
	ORF
	M
	N
GISAID database	ORF 1ab	P4715L, Y232C, F1657L, A1906V, V1973L, G2374R D58E, L952P, E955K, S1498F, N1559T, A3203V, G4227R, A4297G, F4304L Y145del, N354D, D364Y, R416I, S438F, Y508H, D614GG, D3G, T175MA31T, Q57HGH, V88L, H93Y, G196V, G251VV, Q675H, T791I, F797C, A930V, I1216T, P1263L, V74F, S81L V62L, L84SS, L121H, T148I, S193I, S197L, R203KGR, G204RGR, I292T
	ORF 1a
	S
	ORF 3a
	M
	ORF7a
	ORF8
	N
GISAID database	S	D614G P214L G251V, Q57H L84S R203K, G204R
	ORF 1b
	ORF 3a
	ORF8
	N
	ORF 1a, M, ORF6, ORF7a, ORF7b, ORF10
NCBI database	ORF 1ab	D75E, T265I, P971L,L3606F, P4715L, V5550L, P5828L, Y5865C, F6158L D614G Q57H, G251V S24L, V62L, L84S R203K, G204R
	S
	ORF 3a
	ORF 8
	N
GISAID database	ORF 1ab	M4555T, T4847I, T5020I, V5661A, P5703L, M5865V, G3278S, K3353R, I6525T, Ter6668W, A876T, T1246I, S5932F, F3071Y, V483A D3G, T175M S197L, S202N, R203K, G204R S193I, S194L, S197L, S202N, R203K, G204R V62L, L84S
	S
	M
	ORF 3
	N
	ORF 8

Table 2. Geographic distribution of mutant variants of SARS-CoV-2.

First author	Language	Area of study	Non synonymous sample size	S%	N%	M%	ORF 1a/1b %	ORF 3a%	ORF 7a%	ORF 7b%	ORF 8a%	ORF 10a%	ORF6%	E%
First author	Language	Area of study	Non synonymous sample size	S%	N%	M%	ORF 1a/1b %	ORF 3a%	ORF 7a%	ORF 7b%	ORF 8a%	ORF 10a%	ORF6%	E%
Gupta	English	GISAID	47	27.7	14.89	4.25	12.8	12.8	4.25	NA	4.25	NA	NA	NA
Alessia	English	Italy	159	11.9	3.77	2.51	70.4	7.54	1.88	0.62	0.62	0.62	NA	NA
Kumar	English	Indian	4648	19.8	19.16	NA	NA	6.92	NA	NA	1.48	NA	NA	NA
Kim	English	GISAID	1352	13.5	8.8	1.55	NA	5.76	2.58	0.59	2.36	0.81	1.4	0.88
Hasan	English	Bangladesh	1602	15.6	36.14	0.811	39.5	3.3	2.18	0.06	1.87	0.18	0.2	0.12
Jin	English	Zhejiang	37	27	NA	13.51	NA	NA	NA	NA	NA	NA	NA	NA
Laha	English	NCBI	351	12.5	7.4	1.42	67	5.69	1.7	0	2.27	0.56	0.9	0.56
Islam	English	South-East Asia	78	16.7	11.53	3.84	NA	NA	NA	NA	NA	NA	NA	1.2

Table 3. Frequency of mutations among non-synonymous mutation included in meta-analysis.

First author	Language	Area of study	Total mutation sample size	S%	N%	M%	ORF 3a%	ORF 7a%	ORF 7b%	NSP %
First author	Language	Area of study	Total mutation sample size	S%	N%	M%	ORF 3a%	ORF 7a%	ORF 7b%	NSP %
Biswas	English	GISAID	504	16.26	7.14	1.78	3.96	N/A	N/A	N/A
Wang	English	GISAID	4796	1.2	0.07	2.18	4.81	1.83	0.2	44.14
Nguyen	English	GISAID	167	26.34	23.95	1.19	N/A	2.39	1.19	40.11
Utsav	English	GISAID	273	15.38	N/A	N/A	N/A	N/A	N/A	N/A
Nguyen	English	GISAID	171	25.73	23.39	1.16	4.67	2.33	1.16	41.52

Table 4. Frequency of mutations among total mutation included in Meta-analysis.

Analysis of mutations among non-synonymous mutation

In the current study, the prevalence of S,N,M,E,ORF 1a/1b,ORF 3a,ORF 7a,ORF 7b,ORF 8a,ORF 10a and ORF 6 mutations among non-synonymous mutation is varied from 0.06% (ORF7b) to 70.44% (ORF 1a/1b). Also, it is shown that the highest and lowest frequency of S,N,M,ORF 3a,ORF 7a,ORF 7b and NSP mutations among total mutation belongs to N (0.07%) and NSP (44.14%) respectively. In this review 8 sectional studies, S,N,M,E,ORF 1a/1b, ORF 3a, ORF 7a, ORF 7b, ORF 8a, ORF 10a and ORF 6 mutations were assessed among non-synonymous mutation [14].

Analysis of S mutation

Our analysis revealed that the D614G spike mutation has the highest frequency. This mutation improved spike protein fitness with cell surface receptors and increased the virus's transduction compared to the wild type. Other S mutations, P1263L, V483A, and L54F, have a low frequency [15]. The forest plot shows that the overall frequency of S mutation is 16.4% (13.6, 16.6) and with the compounding of the results, the overall prevalence of S mutation with the confidence interval of 95% and based on random effect model is (I²: 85.98%, Q=49.947, P<0.001). Also, the results of the heterogeneity studies show that there is heterogeneity among the primary results of the studies (Figure 3).

Figure 3: Estimation of mutations in S protein among non synonymous mutations.

Analysis of N mutation

Other frequent mutations are R203K and G204R located in the N-area. N genes encode the nuclei capsid protein that contributes to the formation of helical ribonucleoproteins in the virus. These mutations modify m-RNAs' binding mechanism and changed the pathogenesis and development of COVID-19 infection in subjects. Other mutations in region N include S197L, P13L, L37F, P323L, and P1103L, which are less frequent, respectively [16,17]. As can be seen the total prevalence of N, mutations are estimated as 11.7% (7, 19.1). Generally, with the compounding of the results, the overall prevalence of N mutation with the confidence interval of 95 % based on the random effect model is (I²: 98.23%, Q=396.15, P<0.001). Besides, the results of the heterogeneity studies show that there is heterogeneity among the initial results of the studies (Figure 4).

Figure 4: Estimation of mutations in N protein among non-synonymous mutations.

Analysis of M mutation

The M protein plays a part in the viral envelope packaging by interacting with the S protein. Our analysis revealed two low-frequent T175M and D3G mutations in the M gene. Accordingly, analysis of M mutation is calculated 1.9% (0.9, 4.1). The overall prevalence of M mutation with the confidence interval of 95 % based on the random effect model is (I²: 84.70%, Q=45.76, P<0.001). The results of the heterogeneity studies describe that there is a heterogeneity among the result of these studies (Figure 5).

Figure 5: Estimation of mutations in M protein among non synonymous mutations.

Analysis of ORF1a/1b mutation

ORF1ab is a large gene that coded poly protein (16 proteins) involved in virus genome synthesis and replication. P4715L, L3606F, C8517T, A876T and F3071Y mutations are more frequent in ORF1ab. Due to the overall distribution of ORF 1a/1b mutation 12.8% (5.7, 26.4) with the confidence interval of 95 % based on random effect model is (I²: 97.09%, Q=240.66, P<0.001) and it is shown that there is a heterogeneity among the results of the studies [18] (Figure 6).

Figure 6: Estimation of mutations in ORF1/ab among non-synonymous mutations.

Analysis of ORF3a mutation

Q57H, G251V, S193I, and G196V are more frequent mutations in ORF3a. ORF3a proteins are located in host cells and found in the endoplasmic reticulum or Golgi intermediate space, acting as ion channels and controlling the virus's release. Moreover, ORF3a triggers pro-inflammatory pathways and assists in severing modes of infection [19]. It is noteworthy that the ORF3a gene shows a high level of non-synonymous and neutral mutations with a potential effect on B-cells like epitope generation that is a significant point. The incidence of non-synonymous mutation according to ORF 3a group by 95% confidence interval in different studies is shown in the forest plot 5.7% (4.3,7.6). The results of the analysis demonstrated that the heterogeneity among reported studies is (P<0.001; I²=78.67%, Q=32.81) (Figure 7).

Figure 7: Estimation of mutations in ORF3a among non synonymous mutations.

Analysis of ORF7a mutation

Test results of forest plot shows that the average rate of ORF 7a is reported to be 2.1% (1.3,3.3) and the overall prevalence of ORF 7a mutation with the confidence interval of 95% is (I²: 60.03%, Q=17.51, P=<0.014) so, there is a heterogeneity among these studies [20] (Figure 8).

Figure 8: Estimation of mutations in ORF7a among non-synonymous mutations.

Analysis of ORF7b mutation

The forest plot shows the prevalence of the non-synonymous mutation based on ORF 7b mutation and confidence intervals (95% CI). The average frequency of ORF 7b mutation is estimated to be 0.4% (0.1,1.4). We observed heterogeneity (I2:72%, Q=25, P<0.001) among these studies (Figure 9).

Figure 9: Estimation of mutations in ORF7b among non-synonymous mutations.

Analysis of ORF8a mutation

In all non-synonymous mutation groups, the average rate of ORF 8a mutation is 1.8% (1.5,2.1). Based on analysis by 95% confidence interval on fixed effect model, there is no heterogeneity across these studies (I²: 29.82%, Q=9.97, P<0.190) (Figure 10).

Figure 10: Estimation of mutations in ORF8a among non-synonymous mutations.

Analysis of ORF10a mutation

According to the heterogeneity between the results of the studies, the overall prevalence of ORF 10a mutation 0.5% (0.2,1) with the confidence interval of 95 % based on random effect model is (I²: 50.84%, Q=14.24, P<0.047) (Figure 11).

Figure 11: Estimation of mutations in ORF10a among non-synonymous mutations.

Analysis of ORF6 mutation

Based on the heterogeneity for ORF 6 mutation (I²:74.18%, Q=27.11, P<0.001) using the random effects model, the prevalence of mutation is estimated as 0.7% (95% CI: 0.2,1.7) (Figure 12).

Figure 12: Estimation of mutations in ORF6 among non-synonymous mutations.

Analysis of E mutation

The heterogeneity indices show the heterogeneity between the primary results of E mutation. Therefore, the random effect model is applied for combining the results (I²:=56.68% Q=16.16, P<0.024). The pooled event rates for mutations of ORF6 are estimated as 0.4% (0.2,1.1) (Figure 13).

Figure 13: Estimation of mutations in E protein among non-synonymous mutations.

Analysis of mutations among total mutation

In the current meta-analysis, review 5 primary studies. S,N,M,ORF 3a,ORF 7a,ORF 7b and NSP mutations were examined among total mutations.

Analysis of S mutation: Based on the significant heterogeneity observed among the results (Q=45.6, P=0.000 and I²=91.12%), the pooled event rate (95% CI) of developing S mutation using random model was estimated as 18.4% (13.7, 24.4) (Figure 14).

Figure 14: Estimation of mutations in S protein among total mutations.

Analysis of ORF3a mutation: The forest plot indicated that the overall frequency of ORF3a mutation is 3.9% (2.5,6) and with the compounding of the results, the overall prevalence of ORF3a mutation with the confidence interval of 95 % and based on random effect model is (I²: 60.33%, Q=10.08, P<0.039). Also, the results of the heterogeneity studies show that there is heterogeneity among the primary results of the studies (Figure 15).

Figure 15: Estimation of mutations in ORF3a among total mutations.

Analysis of M mutation: The prevalence of total mutation according to the NSP group by 95% confidence interval in different studies is shown in the forest plot 31.6% (21,44.6). The results of the analysis manifest heterogeneity among reported studies (P=0.00; I²=90.47%, Q=42) (Figure 16).

Figure 16: Estimation of mutations in NSP among total mutations.

Analysis of N mutation: According to the severe heterogeneity, the random effect meta-analysis is performed (P=0.00; I²=96.35%, Q=109.83). The overall mutation of N using the random effect model meta-analysis is 10.5% (95% CI; 5.1, 20.4) (Figure 17).

Figure 17: Estimation of mutations in N protein among total mutations.

Analysis of M mutation: Heterogeneity indices for primary results for M were not statistically significant (I²: 16.55%, Q=4.79, P<0.309). Therefore, using fixed effect model, the event rate for M mutation was estimated as 2.1% (95% CI: 1.7, 2.5) (Figure 18).

Figure 18: Estimation of mutations in M protein among total mutations.

Analysis of ORF7a mutation: More ever there was no significant heterogeneity between the results of primary studies regarding the effect of ORF 7a (I²: 46.89%, Q=7.53, P<0.11). The pooled event rate for ORF7a was estimated at 1.8% (95% CI: 1.5, 2.2) (Figure 19).

Figure 19: Estimation of mutations in ORF7a among total mutations.

Analysis of ORF7b mutation: In this study, it is observed a great heterogeneity between the results of studies regarding the effect of ORF7b (I²: 58.07%, Q=9.54, P<0.049). Therefore, the random effects model was applied that estimated the pooled event rate for this mutation as 0.4% (95% CI: 0.1,1.2) (Figure 20).

Figure 20: Estimation of mutations in ORF7b among total mutations.

DISCUSSION

Research on the variation in the SARS-CoV-2 genome sequence is necessary for the examination of disease course of COVID-19, disease progression, monitoring, controlling and treatment of SARS-CoV-2 infection. In this present study, the genome sequences of MERS-CoV-2 isolates were examined. The impact of epitope deletion among non synonymous mutations was the aim of this study which is related to immune escape and pathogenesis. Our study showed that according to the pooled prevalence (95% confidence interval) of mutations, the S variation was shown high frequency 16.4% (13.6, 16.6) among non-synonymous and NSP was the most common mutant among total mutation 31. 6% (21,44.6).

The high mutation of RNA viruses causes genetic variation, virus evolution and it is a strategy to escape the immune system and drug resistance. The SARS-CoV-2 complete genomes with different geographical locations are essential for detecting the genetic variations in the virus that causes viral shedding. Several genome variations in the SARS-CoV-2, such as nuclei capsid N protein, ORF4a and the surface protein S associated with the host immune system. Research indicates that genetic variations of SARS-COV-2 can transmit during the early stage of the epidemic; however, genomes are remarkably stable, and they are not able to evolve rapidly.

It is demonstrated that the fatality rate of COVID-19 can vary in different populations, and the level of virulence varies among humans. A larger number of specific mutations with a rapid transmission are detected in Italy, Spain and US and it is related to critical conditions. However, it is demonstrated that genome sequences of SARS-CoV-2 are similar with only a few mutations, but some countries such as North America and Europe are shown the heavily affected regions and Australia, Asia and Africa less affected with sequence variation. Research shows that the variation of RNA viruses is pivotal during an outbreak and it depends on nucleotide substitutions. Based on the viral transmission, the viral mutation rates vary in different viruses and help the virus in host adaptation.

Finding non-synonymous mutations through the database is useful for identifying mutations and their modes of transmission. There are some new variants such as (deletion 69-70, deletion 144, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H) are defined in the spike protein of SARS-CoV-2. The novel mutation, (N501Y) which is found in the UK virus variant is located in the Receptor Binding Domain (RBD). The severity and infectious diseases of the UK variant remain unknown. In SARS-CoV-2 viruses, D614G, is a common mutation spike protein around the world. Also, it is proposed that the highest frequency of spike D614G mutation (S) may be associated with higher viral loads, cellular infectivity, infection severity and lethal outcome in COVID-19. The relation between high viral loads in the upper respiratory tracts and G clade is measured by RT PCR. It is suggested that the sensitivity of the G variant of SARS-CoV-2 spike to neutralizing antibody is more sensitive than D variant. It is reported that D to G mutation at position 614 (D614G) in the spike glycoprotein which is originated from Europe or China is a significant variation in changes of the secondary structure of protein. D614G mutation started in all affected regions such as Bangladesh (with 95.6% D614G mutation), Italy, Spain, North America and European countries, amino acid substitution 1109 (F→L) and 76th (S_T76I) position at spike protein found in Bangladeshi and Indonesian strain respectively. It is also suggested that mutation in RNA dependent RNA polymerase (RdRp) and D614G increase SARS-CoV-2 transmission and promote the infectivity of SARS-CoV-2. The study of 12,300 SARS-CoV-2 genome sequences from different countries reported that D614G and P4715L variation was associated with higher COVID-19 mortality.

It is evident that ORF1ab P4715L (nsp 12) plays a pivotal role in viral replication and it is reported that ORF1ab-V378I mutation is associated with COVID-19 infection in Taiwan, Australia and Germany. Also, three mutations, including (M5865V, S5932F) and (R203K) described in ORF1ab and N respectively. It is noticed that mutation in Nuclei capsid (N protein) (R203K and G204R) observed in Italy, Spain, India and France and also N_S202N mutant was detected in Saudi Arabia.

CONCLUSION

Our study shows that substitution in S protein (D614G) is the dominant variant in Asia, Oceania, Europe and North America mutant, Italy, Morocco and Saudi Arabia and led to severe respiratory infections and death in these regions. Genomic mutation analysis of SARS-CoV-2 strains may provide knowledge about different biological infrequent mutations and their relationships of viral transmission, pathogenicity, infectivity, and fatality rates between SARS-CoV-2 and human cells.

Conflict of interests

The authors declare there is no conflict of interest.

References

Pimentel RMM, et al. The dissemination of COVID-19: An expectant and preventive role in global health. J Human Dev Capabil. 2020;30:135-140.
[Crossref] [Google Scholar]
Adedokun KA, et al. A close look at the biology of SARS-CoV-2, and the potential influence of weather conditions and seasons on COVID-19 case spread. Infect Dis Poverty. 2020;9:1-5.
[Crossref] [Google Scholar]
Toyoshima Y, et al. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J Hum Genet. 2020;65:1075-1082.
[Crossref] [Google Scholar] [PubMed]
Pachetti M, et al. Emerging SARS-CoV-2 mutation hot spots include a novel RNA dependent RNA polymerase variant. J Transl Med. 2020;18:1-9.
[Crossref] [Google Scholar]
Kumar BK, et al. Mutational analysis unveils the temporal and spatial distribution of G614 genotype of SARS-CoV-2in different Indian states and its association with case fatality rate of COVID-19. BioRxiv. 2020;2006-2007.
[Crossref] [Google Scholar]
Wang C, et al. The establishment of reference sequence for SARSÃÂ¢Ãâ¬ÃÂCoVÃÂ¢Ãâ¬ÃÂ2 and variation analysis. J Med Virol. 2020;92:667-674.
[Crossref] [Google Scholar] [PubMed]
Korber B, et al. Tracking changes in SARS-CoV-2 spike: Evidence that D614G increases infectivity of the COVID-19 virus. Cell. 2020;182:812-827.
[Crossref] [Google Scholar]
Biswas NK, et al. Analysis of RNA sequences of 3636 SARS-CoV-2 collected from 55 countries reveals selective sweep of one virus type. Indian J Med Res. 2020;151:449-450.
[Crossref] [Google Scholar] [PubMed]
Biswas NK, et al. Analysis of RNA sequences of 3636 SARS-CoV-2 collected from 55 countries reveals selective sweep of one virus type. Indian J Med Res. 2020;151:449-450.
[Crossref] [Google Scholar] [PubMed]
Plante JA, et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature. 2021;592:116-121.
[Google Scholar] [PubMed]
Thomson EC, et al. The circulating SARS-CoV-2 spike variant N439K maintains fitness while evading antibody-mediated immunity. BioRxiv. 2020;5:2020-2021.
[Crossref] [Google Scholar]
Hodcroft EB, et al. Spread of a SARS-CoV-2 variant through Europe in the summer of 2020. Nature. 2021;29:707-712.
[Crossref] [Google Scholar]
Daniloski Z, et al. The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types. Elife. 2021;10:65365.
[Crossref] [Google Scholar] [PubMed]
Dutta NK, et al. The nucleocapsid protein of SARS–CoV-2: A target for vaccine development. J Virol. 2020;16:647-620.
[Crossref] [Google Scholar] [PubMed]
Maitra A, et al. Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility. J Biosci. 2020;45:1-8.
[Crossref] [Google Scholar] [PubMed]
Jamwal S, et al. An updated insight into the molecular pathogenesis, secondary complications and potential therapeutics of COVID-19 pandemic. Life Sci. 2020;257:118105.
[Crossref] [Google Scholar] [PubMed]
Rehman S, et al. Identification of novel mutations in SARS-COV-2 isolates from Turkey. Arch Virol. 2020;165:2937-2944.
[Crossref] [Google Scholar] [PubMed]
Hassan SS, et al. Pathogenic perspective of missense mutations of ORF3a protein of SARS-CoV-2. Virus Res. 2021;15:198441.
[Crossref] [Google Scholar] [PubMed]
Gong YN, et al. SARS-CoV-2 genomic surveillance in Taiwan revealed novel ORF8-deletion mutant and clade possibly associated with infections in Middle East. Emerg Microbes Infect. 2020;1:1457-1466.
[Crossref] [Google Scholar] [PubMed]
Leung K, et al. Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020. Euro Surveill. 2021;7:2002106.
[Crossref] [Google Scholar] [PubMed]