ISSN: 2322-0066

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Emerging SARS-CoV-2 Genetic Variations and Mutations in the COVID-19 Genomic Sequence: A Systematic and Meta-Analysis Review

Tahoora Mousavi1*, Monireh Golpour1, Reza Valadan1,2, Reza Alizadeh Navaei3, Mehryar Zargari4, Mehrdad Gholami5, Mohammadreza Haghshenas6

1Department of Molecular and Cell Biology, Mazandaran University of Medical Sciences,Sari, Iran

2Department of Immunology, Mazandaran University of Medical Sciences, Sari, Iran

3Department of Gastro Intestinal Cancer, Mazandaran University of Medical Sciences, Sari, Iran

4Department of Biochemistry, Genetic, Molecular and Cell Biology, Mazandaran University of Medical Sciences, Sari, Iran

5Department of Microbiology and Virology, Mazandaran University of Medical Sciences, Sari, Iran

6Department of Microbiology, Molecular and Cell Biology, Mazandaran University of Medical Sciences, Sari, Iran

*Corresponding Author:
Tahoora Mousavi
Department of Molecular and Cell Biology,
Mazandaran University of Medical Sciences,
Sari,
Iran
E-mail:
T.mousavi@mazums.ac.ir

Received: 21-Jan-2023, Manuscript No. JOB-23-87643; Editor assigned: 24-Jan-2023, PreQC No. JOB-23-87643 (PQ); Reviewed: 07-Feb-2023, QC No. JOB-23-87643; Revised: 21-Mar-2023, Manuscript No. JOB-23-87643; Published: 30-Mar-2023, DOI: 10.4172/2322-0066.11.1.006

Visit for more related articles at Research & Reviews: Research Journal of Biology

Abstract

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is the causative agent of Coronavirus disease 2019 (COVID-19). The high mutation rate of RNA viruses causes genetic variation, virus evolution and it is a strategy to escape the immune system. In the present study, all researches and evidence were extracted from the available online national databases. Two researchers randomly evaluated the assessment of the research sensitivity. Finally, after quality assessment and specific inclusion and exclusion criteria, the eligible articles were entered for meta-analysis. The heterogeneity between the results of studies was measured using test statistic (Cochran's Q) and I2 index. The forest plots illustrated the point and pooled estimates with 95% confidence intervals (crossed lines). All statistical analyses were performed using comprehensive meta-analysis V.2 software. This meta-analysis included 13 primary studies investigating the SARS-CoV-2 genetic variations and mutations in the COVID-19 genomic sequence. According to the pooled prevalence (95% confidence interval) of mutations, the spike gene variations showed the highest non-synonymous mutation frequency (16.4%, CI: 13.6, 16.6) and the Non-Structural Protein (NSP) genes possess the highest mutation frequency among total mutations (31.6%, CI: 21, 44.6). Genomic mutation analysis of SARS-CoV-2 strains may provide knowledge about different biological infrequent mutations and their relationships of viral transmission, pathogenicity, infectivity, and fatality rates between SARS-CoV-2 and human cells.

Keywords

Genetic Variation; SARS-CoV-2; Mutation; COVID-19 sequences

Introduction

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) the causative agent of coronavirus disease 2019 (COVID-19), poses a foremost challenge to public health. Since the primary appearance of SARS-CoV-2 in late December 2019 in Wuhan, Hubei province, central China, a high dissemination rate has been observed worldwide [1]. Based on information released from the World Health Organization (WHO) on 29 December 2020, the present pandemic COVID-19 has nearly 79 million confirmed cases worldwide and over 1.7 million. The SARS-CoV-2 is classified in the family of Coronaviridae, the order of Nidovirales and the genus Betacoronavirus. Similar to other coronaviruses, the genome of SARS-CoV-2 consists of specific genes encoding some structural/non-structural proteins. Mutation level among RNA viruses is notably high, which this phenomenon is essential for viral adaptation [2]. Though, coronaviruses have been introduced to have proofreading systems and so, nucleotide sequence variety in SARS-CoV-2 has been observed at a very low level. In a study reported the presence of 13 variations site in Open Reading Frames (ORF) of SARS-CoV-2 in 1a, 1b, S, 3a, M, 8 and N regions, which among them positions nt28144 and nt8782 in ORF 8 and ORF 1a indicated mutation rate of 30.53% and 29.47%, respectively [3]. In addition, based on the evidence obtained from a study on 48,635 SARS-CoV-2 sequences, 353, 341 mutations have been detected throughout the world. Among them, D614G mutation in C-terminal of the spike protein (aspartate to glycine substitution at position 614) is one such evolutionary alteration detected in the SARS-CoV-2 and has become the most common type reported in many regions of the world such as Europe, Oceania, South America and Africa. The present study was aims to assess the prevalence of SARS-CoV-2 genetic variation and mutation in COVID-19 sequences [4].

LITERATURE REVIEW

Genetic diversity and mutations of the COVID-19

There are several reports of unusual public health due to variants of SARS-CoV-2, which changes in transmissibility, clinical features, and severity. Shows the list of significant mutations in the world (Table 1).

Name of variant or mutation Time of mutation Area Location Out comes
D614G Early February 2020 China Spike protein D614G have indicates greater transmissibility in humans rather than greater pathogenicity.
D614G produce higher viral loads.
D614G variant more susceptible to neutralizing antibodies and does not causes serious disease or alter the efficacy of vaccines.
SARS-CoV-2 VOC 202012/01 or
B.1.1.7 or 20B/501Y.V1
December 2020 UK and 31 other countries RBD Increased transmissibility
No change in disease severity.
No evidence that this variant has any impact on the vaccine.
Mutation of N501Y is detected in B.1.1.7.
B.1.351 or 501Y.V2 18 December, 2020 30 December, 2020 South Africa
other countries
RBD Higher viral load
Increased transmissibility
No associated with more severe disease or worse outcomes.
K417N mutation effect on monoclonal and poly clonal antibody.
Mutation of N501Y, E484K and K417N are detected in B.1.351.
E484K makes the vaccine less effective against it.
B.1.1.248 (P.1) or  501Y.V3 January 2021 Tokyo and 3 other countries (Brazilian) RBD This variant contain N501Y (More transmission) and E484K (Escape of antibody) and K417N.
Effective on the production of antibody, vaccination or virus neutralization.
N439K March 2020 October 2020 Wuhan, Europe
12 countries
RBD Escape from immune system.
SARS-CoV-2 Bind to Human ACE2 more strongly than original strain.
N439K escape from polyclonal and neutralizing antibody responses.
A.EU1 June 2020 Spain,  UK and 12 countries spike protein It is not clear for increasing of the transmissibility of the virus.
Mutation of A222V and A220V are detected in A.EU1.
Spike mutations A222V had a functional effect on spike’s ability to mediate cell entry.
Less effective against vaccine.
Cluster 5 August and September 2020 Denmark spike protein Decrease the duration of immune protection following natural infection or vaccination
Cluster 5 variant identified only in 12 human cases and this variant dose not spread widely
Might effect on vaccine development.

Table 1. The list of significant mutations in the world.

METHODOLOGY

Search strategy

In the present study, the search strategy was done using available online national databases, including ISI, Science direct, Scopus, Pubmed, Wiley and Google scholar between December 2019 and March 2021. The search was performed based on appropriate keywords of SARS-CoV-2, variation, mutation and COVID-19 sequences, which were combined with and/or/not to determine and screen articles in the search strategy [5]. Besides, it is investigated the references of the published studies to improve the sensitivity of the search. The assessment of the research was randomly evaluated by two researchers and confirmed that all suitable studies had been detected [6,7].

Study selection

At first, articles of all researches, evidence or reports were extracted from the electronic database. After examinations of studies, duplicate articles were identified and removed from the study. Then, after analyzing the articles, the irrelevant articles were excluded by reviewing of title, abstract, and full text. Also, articles screened for eligibility and review articles and articles published in other languages were extracted from this study.

Quality assessment

The PRISMA checklist was used for evaluation of the quality of the related studies and determination of the selected studies based on title and contents [8]. The PRISMA checklist consists of 27 items covering different aspects of research methodology such as determining Protocol and registration, eligibility criteria, search, study selection, defining variables, method of data collection, risk of bias in individual studies, presentation of results and statistical tests. Each question was required one score [9].

Inclusion/Exclusion criteria

All articles approved by the above assessment phases were considered eligible for final meta-analysis:

•All English studies.


•Studies based on the prevalence of SARS-CoV-2 genetic variation among total mutation.


•Reported prevalence of SARS-CoV-2 genetic variation among non-synonymous mutation.

The following studies were ruled out:

•Duplicated studies.


•Non-relevant articles.


•Article with non-full length sequence.


•Abstracts, letters or review studies.


•Studies published in languages other than English.


•Articles with no access to the full text.

Data extraction

After selection of appropriate articles, the following data for each research were extracted based on first author’s name, geographical regions, publication year, language, the number of total mutations, non-synonym mutations, mutation in S-protein, mutation in N protein, mutation in M protein, mutation in E protein, ORF 1a/1b, ORF 3a, ORF 7a, ORF 7b, ORF8a, ORF 10a, ORF6, ORF 1a and NSP. The data were extracted and entered into a Microsoft Excel spread sheet [10].

Statistical analysis

The primary outcome was the SARS-CoV-2 genetic variation and mutation in COVID-19 sequences. In our research, the heterogeneity between the results of studies was measured using the test statistic (Cochran's Q) and the I2 index. P-value less than 0.1 were used to consider significant heterogeneity. The forest plots illustrated the point and pooled estimates with 95% confidence intervals (crossed lines). Each box in a forest plot indicated the study's weight [11]. The heterogeneity and homogeneity of the suspected factors were performed using random and fixed effects models, respectively and more than 50% were considered as high degrees of heterogeneity. All statistical analyses were performed using comprehensive meta analysis V.2 software [12].

RESULTS AND DISCUSSION

In the present study, 1370 articles were identified in the starting process. The number of studies was reduced to 1209 following the removal of duplicate articles. In the next step, 890 irrelevant documents were removed after reviewing the full texts [13]. Then, 319 articles were considered for further screening. After the exclusion of 291 articles, 28 articles were assessed for eligibility. 6 articles with non-full length sequence, 8 review articles and one article with other languages were excluded. Finally, 13 relevant articles were included in the meta-analysis review (Figure 1). In addition, the geographic distribution and frequently mutated residues among COVID-19 sequences are shown in Table 2 and Figure 2 respectively.
In addition, the geographic distribution and frequently mutated residues among COVID-19 sequences are shown in and respectively (Figures 1 and 2 and Tables 2-4).

JOB-Flow

Figure 1: Flow chart of the literature search strategy for selection and including primary articles.

JOB-Frequently

Figure 2: Frequently mutated residues among COVID-19 sequences from different locations.

Strain name Location Mutation position
hCoV-19/Singapore ORF 1ab C8517T, T17459C (V5820A) T2449C (F817L)
C176A (A59D)
C595T (P199S)
S
ORF3a
N
ORF8
CGMH-CGU-02 (Taiwan) ORF1ab  S C8517T, A16577G (K5526R)
C145T (H49Y), C2651T (S884F)
S
ORF8
hCoV-19/Los Angeles ORF1ab F924F, P4715L D614G 
S
hCoV-19/ Asia, Oceania, Europe, North America ORF1ab (1397 nsp2, 2891 nsp3, 14408 RdRp, 17746 and 17857 nsp143, 18060 nsp14), (23403, spike protein) (28881, nucleocapsid phosphoprotein) (nt 26143) (nt 28144)
S
ORF9a
ORF3a
ORF8a
NCBI ORF1ab P4715L, L3606F D614G R203K/G204R, P13L, 203K/204R
S
N
hCoV-19/GISAID S D614G, L5F, L8V/W, H49Y, Y145H/del, Q239K, V367F, G476S, V483A, V615I/F, A831V, D839Y/N/E, P1263L,
hCoV-19/Singapore ORF7b 382-nt deletion
ORF8
hCoV-19/Australia ORF7b 138-nt deletion
ORF8
hCoV-19/Bangladesh ORF8 345-nt deletion
hCoV-19/Spain ORF8 62-nt deletion
hCoV-19/Italy ORF 1ab (S443F, H3076Y, L3606F, P4715L, E5689D, R5919K) (D3G) (G70C) (A570D, D614G, G1046V) (G251V) (R203K-G204R, V246I)
M
ORF 7a
S
ORF 3a
N
GISAID database ORF 1ab, ORF 1a,ORF8 (nsp12, nsp13, RdRp) (nsp2, nsp6)
S, ORF 3a, N
Bangladesh S I300F (nsp2), P4715L (nsp12), D614G R203K, G204R (N protein)
N
Indian states 2'-O-ribose methyltransferase N298L V871I, A88V, P314L P1103L, S1285F, S1197R, A994D, T1198K D279N, L37F. A380V, G339S, Q496P, S202N T372I , L177F L46F,  Q57H L84S L54F, D614G P13L, S194L, RG203KR
RNA-dependent RNA polymerase
Predicted phosphoesterase, papain-like proteinase
Transmembrane protein
NSP
3'-to-5' exonuclease
ORF3a
ORF8
S
NP
South American ORF 1b D614G, E1207V G392D, T708I, I739V, P765S, A876T, A1043V, N2894D, F3071Y, G3334S, L3606F Q57H, G196V, G251V T175M L84S D103Y, R191C, S197L, R203K , G204R, G238C
ORF 1a
ORF 3a
M
ORF 8
N
GISAID database 3’UTR G204R-S194L, R203K, S202N L84S-Q57H D614G, A879S A1812D (nsp3), L3606F (nsp6), P4715L (RdRp)
5’UTR
N
ORF 8
M
ORF 3
S
ORF1ab
Northern Vietnam S L54F, S254F, C1250F, D614G Q57H, G251V D3G, V70F S81L, L96F, L102_I103del R203K; G204R, S180I, A211V, Q283H Gly82_Val86del, Met85del (nsp1), T85I, G212D, 559V, P585S (nsp2)
A58T, T428I, R646W, L672F, G730D, P1103L, K1186R, M1901I (nsp3),
D477N (nsp4),
G15S (nsp5),
L37F (nsp6),
D161V, P323L, V338F (nsp12)
R595S(nsp13)
V320L(nsp15)
P134S, T140I (nsp16)
ORF 3a
M
ORF 7a
ORF 7b
N
South-East Asia S D614G R203K, G204R, P13L, Q57H, NS8_L84S, L37F, P323L A97V, T1198K
N
NSP
Morocco S D614G
Saudi Arabia S P97 L, T424I, C1313S, W553R, S950T, R700L, S191 P, S459T, V26 L, Q1009L S733R/E736 K, F1609 L, P1883S, M4574I, V51I, T649I, P777S, A1045 V, V1202I, E1835A, M2119I, P2742S,G3117E/V3120D, F5011I, R5027S/D5028Y/R5029 K, G5061R, H243C, H73N, 167 T127I W293C, A300V, V178A, S11 F, L283 F, G28 V, D242E, V263A, P7L, G198S, R292 P, S391I, G, L3785F, A5362 L, V117D, D2204A, V5551A, E102C, T5560S, V6030 F, T636I, G981S, V1375I, A1462 T, F3098 L/T3101N, F5072 L, I5559 M/T5560 F/G5561W/L5562 V/Y5563 V, I6668 M, V6431F, G85D/P86 F/T87N, K1255R, H1714Y, N4385 K, I4611 V, A5764S, M6272I, L6373R, L6958 P, V62F, G85D/P86F, A218S, T586S, P1099R, E1428 G, A1301 V, P1971S, S2111R, K5019 N, L6412 F, P86L, I147L T127I
W293C, A300V, V178A, S11 F, L283 F, G28 V, D242E, V263A, P7L, G198S, R292 P, S391I,
ORF
M
N
GISAID database ORF 1ab P4715L, Y232C, F1657L, A1906V, V1973L, G2374R D58E, L952P, E955K, S1498F, N1559T, A3203V, G4227R, A4297G, F4304L Y145del, N354D, D364Y, R416I, S438F, Y508H, D614GG, D3G, T175MA31T, Q57HGH, V88L, H93Y, G196V, G251VV, Q675H, T791I, F797C, A930V, I1216T, P1263L, V74F, S81L V62L, L84SS, L121H, T148I, S193I, S197L, R203KGR, G204RGR, I292T
ORF 1a
S
ORF 3a
M
ORF7a
ORF8
N
GISAID database S D614G P214L G251V, Q57H L84S R203K, G204R
ORF 1b
ORF 3a
ORF8
N
ORF 1a, M, ORF6, ORF7a, ORF7b, ORF10
NCBI database ORF 1ab D75E, T265I, P971L,L3606F, P4715L, V5550L, P5828L, Y5865C, F6158L D614G Q57H, G251V S24L, V62L, L84S R203K, G204R
S
ORF 3a
ORF 8
N
GISAID database ORF 1ab M4555T, T4847I, T5020I, V5661A, P5703L, M5865V, G3278S, K3353R, I6525T, Ter6668W, A876T, T1246I, S5932F, F3071Y, V483A D3G, T175M S197L, S202N, R203K, G204R S193I, S194L, S197L, S202N, R203K, G204R V62L, L84S
S
M
ORF 3
N
ORF 8

Table 2. Geographic distribution of mutant variants of SARS-CoV-2.

First author Language Area of study Non synonymous sample size S% N% M% ORF 1a/1b % ORF 3a% ORF 7a% ORF 7b% ORF 8a% ORF 10a% ORF6% E%
 Gupta English GISAID 47 27.7 14.89 4.25 12.8 12.8 4.25 NA 4.25 NA NA NA
Alessia English Italy 159 11.9 3.77 2.51 70.4 7.54 1.88 0.62 0.62 0.62 NA NA
Kumar English Indian 4648 19.8 19.16 NA NA 6.92 NA NA 1.48 NA NA NA
Kim English GISAID 1352 13.5 8.8 1.55 NA 5.76 2.58 0.59 2.36 0.81 1.4 0.88
Hasan English Bangladesh 1602 15.6 36.14 0.811 39.5 3.3 2.18 0.06 1.87 0.18 0.2 0.12
Jin English Zhejiang 37 27 NA 13.51 NA NA NA NA NA NA NA NA
Laha English NCBI 351 12.5 7.4 1.42 67 5.69 1.7 0 2.27 0.56 0.9 0.56
 Islam English South-East Asia 78 16.7 11.53 3.84 NA NA NA NA NA NA NA 1.2

Table 3. Frequency of mutations among non-synonymous mutation included in meta-analysis.

First author Language Area of study Total mutation sample size S% N% M% ORF 3a% ORF 7a% ORF 7b% NSP %
Biswas English GISAID 504 16.26 7.14 1.78 3.96 N/A N/A N/A
Wang English GISAID 4796 1.2 0.07 2.18 4.81 1.83 0.2 44.14
Nguyen English GISAID 167 26.34 23.95 1.19 N/A 2.39 1.19 40.11
Utsav  English GISAID 273 15.38 N/A N/A N/A N/A N/A N/A
Nguyen English GISAID 171 25.73 23.39 1.16 4.67 2.33 1.16 41.52

Table 4. Frequency of mutations among total mutation included in Meta-analysis.

Analysis of mutations among non-synonymous mutation

In the current study, the prevalence of S,N,M,E,ORF 1a/1b,ORF 3a,ORF 7a,ORF 7b,ORF 8a,ORF 10a and ORF 6 mutations among non-synonymous mutation is varied from 0.06% (ORF7b) to 70.44% (ORF 1a/1b). Also, it is shown that the highest and lowest frequency of S,N,M,ORF 3a,ORF 7a,ORF 7b and NSP mutations among total mutation belongs to N (0.07%) and NSP (44.14%) respectively. In this review 8 sectional studies, S,N,M,E,ORF 1a/1b, ORF 3a, ORF 7a, ORF 7b, ORF 8a, ORF 10a and ORF 6 mutations were assessed among non-synonymous mutation [14].

Analysis of S mutation

Our analysis revealed that the D614G spike mutation has the highest frequency. This mutation improved spike protein fitness with cell surface receptors and increased the virus's transduction compared to the wild type. Other S mutations, P1263L, V483A, and L54F, have a low frequency [15]. The forest plot shows that the overall frequency of S mutation is 16.4% (13.6, 16.6) and with the compounding of the results, the overall prevalence of S mutation with the confidence interval of 95% and based on random effect model is (I²: 85.98%, Q=49.947, P<0.001). Also, the results of the heterogeneity studies show that there is heterogeneity among the primary results of the studies (Figure 3).

JOB-Estimation

Figure 3: Estimation of mutations in S protein among non synonymous mutations.

Analysis of N mutation

Other frequent mutations are R203K and G204R located in the N-area. N genes encode the nuclei capsid protein that contributes to the formation of helical ribonucleoproteins in the virus. These mutations modify m-RNAs' binding mechanism and changed the pathogenesis and development of COVID-19 infection in subjects. Other mutations in region N include S197L, P13L, L37F, P323L, and P1103L, which are less frequent, respectively [16,17]. As can be seen the total prevalence of N, mutations are estimated as 11.7% (7, 19.1). Generally, with the compounding of the results, the overall prevalence of N mutation with the confidence interval of 95 % based on the random effect model is (I²: 98.23%, Q=396.15, P<0.001). Besides, the results of the heterogeneity studies show that there is heterogeneity among the initial results of the studies (Figure 4).

JOB-mutations

Figure 4: Estimation of mutations in N protein among non-synonymous mutations.

Analysis of M mutation

The M protein plays a part in the viral envelope packaging by interacting with the S protein. Our analysis revealed two low-frequent T175M and D3G mutations in the M gene. Accordingly, analysis of M mutation is calculated 1.9% (0.9, 4.1). The overall prevalence of M mutation with the confidence interval of 95 % based on the random effect model is (I²: 84.70%, Q=45.76, P<0.001). The results of the heterogeneity studies describe that there is a heterogeneity among the result of these studies (Figure 5).

JOB-protein

Figure 5: Estimation of mutations in M protein among non synonymous mutations.

Analysis of ORF1a/1b mutation

ORF1ab is a large gene that coded poly protein (16 proteins) involved in virus genome synthesis and replication. P4715L, L3606F, C8517T, A876T and F3071Y mutations are more frequent in ORF1ab. Due to the overall distribution of ORF 1a/1b mutation 12.8% (5.7, 26.4) with the confidence interval of 95 % based on random effect model is (I²: 97.09%, Q=240.66, P<0.001) and it is shown that there is a heterogeneity among the results of the studies [18] (Figure 6).

JOB-among

Figure 6: Estimation of mutations in ORF1/ab among non-synonymous mutations.

Analysis of ORF3a mutation

Q57H, G251V, S193I, and G196V are more frequent mutations in ORF3a. ORF3a proteins are located in host cells and found in the endoplasmic reticulum or Golgi intermediate space, acting as ion channels and controlling the virus's release. Moreover, ORF3a triggers pro-inflammatory pathways and assists in severing modes of infection [19]. It is noteworthy that the ORF3a gene shows a high level of non-synonymous and neutral mutations with a potential effect on B-cells like epitope generation that is a significant point. The incidence of non-synonymous mutation according to ORF 3a group by 95% confidence interval in different studies is shown in the forest plot 5.7% (4.3,7.6). The results of the analysis demonstrated that the heterogeneity among reported studies is (P<0.001; I²=78.67%, Q=32.81) (Figure 7).

JOB-non

Figure 7: Estimation of mutations in ORF3a among non synonymous mutations.

Analysis of ORF7a mutation

Test results of forest plot shows that the average rate of ORF 7a is reported to be 2.1% (1.3,3.3) and the overall prevalence of ORF 7a mutation with the confidence interval of 95% is (I²: 60.03%, Q=17.51, P=<0.014) so, there is a heterogeneity among these studies [20] (Figure 8).

JOB-ORF7a

Figure 8: Estimation of mutations in ORF7a among non-synonymous mutations.

Analysis of ORF7b mutation

The forest plot shows the prevalence of the non-synonymous mutation based on ORF 7b mutation and confidence intervals (95% CI). The average frequency of ORF 7b mutation is estimated to be 0.4% (0.1,1.4). We observed heterogeneity (I2:72%, Q=25, P<0.001) among these studies (Figure 9).

JOB-ORF7b

Figure 9: Estimation of mutations in ORF7b among non-synonymous mutations.

Analysis of ORF8a mutation

In all non-synonymous mutation groups, the average rate of ORF 8a mutation is 1.8% (1.5,2.1). Based on analysis by 95% confidence interval on fixed effect model, there is no heterogeneity across these studies (I²: 29.82%, Q=9.97, P<0.190) (Figure 10).

JOB-ORF8a

Figure 10: Estimation of mutations in ORF8a among non-synonymous mutations.

Analysis of ORF10a mutation

According to the heterogeneity between the results of the studies, the overall prevalence of ORF 10a mutation 0.5% (0.2,1) with the confidence interval of 95 % based on random effect model is (I²: 50.84%, Q=14.24, P<0.047) (Figure 11).

JOB-synonymous

Figure 11: Estimation of mutations in ORF10a among non-synonymous mutations.

Analysis of ORF6 mutation

Based on the heterogeneity for ORF 6 mutation (I²:74.18%, Q=27.11, P<0.001) using the random effects model, the prevalence of mutation is estimated as 0.7% (95% CI: 0.2,1.7) (Figure 12).

JOB-ORF6

Figure 12: Estimation of mutations in ORF6 among non-synonymous mutations.

Analysis of E mutation

The heterogeneity indices show the heterogeneity between the primary results of E mutation. Therefore, the random effect model is applied for combining the results (I²:=56.68% Q=16.16, P<0.024). The pooled event rates for mutations of ORF6 are estimated as 0.4% (0.2,1.1) (Figure 13).

JOB-ot

Figure 13: Estimation of mutations in E protein among non-synonymous mutations.

Analysis of mutations among total mutation

In the current meta-analysis, review 5 primary studies. S,N,M,ORF 3a,ORF 7a,ORF 7b and NSP mutations were examined among total mutations.

Analysis of S mutation: Based on the significant heterogeneity observed among the results (Q=45.6, P=0.000 and I²=91.12%), the pooled event rate (95% CI) of developing S mutation using random model was estimated as 18.4% (13.7, 24.4) (Figure 14).

JOB-total

Figure 14: Estimation of mutations in S protein among total mutations.

Analysis of ORF3a mutation: The forest plot indicated that the overall frequency of ORF3a mutation is 3.9% (2.5,6) and with the compounding of the results, the overall prevalence of ORF3a mutation with the confidence interval of 95 % and based on random effect model is (I²: 60.33%, Q=10.08, P<0.039). Also, the results of the heterogeneity studies show that there is heterogeneity among the primary results of the studies (Figure 15).

JOB-amon

Figure 15: Estimation of mutations in ORF3a among total mutations.

Analysis of M mutation: The prevalence of total mutation according to the NSP group by 95% confidence interval in different studies is shown in the forest plot 31.6% (21,44.6). The results of the analysis manifest heterogeneity among reported studies (P=0.00; I²=90.47%, Q=42) (Figure 16).

JOB-Estim

Figure 16: Estimation of mutations in NSP among total mutations.

Analysis of N mutation: According to the severe heterogeneity, the random effect meta-analysis is performed (P=0.00; I²=96.35%, Q=109.83). The overall mutation of N using the random effect model meta-analysis is 10.5% (95% CI; 5.1, 20.4) (Figure 17).

JOB-mutat

Figure 17: Estimation of mutations in N protein among total mutations.

Analysis of M mutation: Heterogeneity indices for primary results for M were not statistically significant (I²: 16.55%, Q=4.79, P<0.309). Therefore, using fixed effect model, the event rate for M mutation was estimated as 2.1% (95% CI: 1.7, 2.5) (Figure 18).

JOB-atio

Figure 18: Estimation of mutations in M protein among total mutations.

Analysis of ORF7a mutation: More ever there was no significant heterogeneity between the results of primary studies regarding the effect of ORF 7a (I²: 46.89%, Q=7.53, P<0.11). The pooled event rate for ORF7a was estimated at 1.8% (95% CI: 1.5, 2.2) (Figure 19).

JOB-total

Figure 19: Estimation of mutations in ORF7a among total mutations.

Analysis of ORF7b mutation: In this study, it is observed a great heterogeneity between the results of studies regarding the effect of ORF7b (I²: 58.07%, Q=9.54, P<0.049). Therefore, the random effects model was applied that estimated the pooled event rate for this mutation as 0.4% (95% CI: 0.1,1.2) (Figure 20).

JOB-ORF7b

Figure 20: Estimation of mutations in ORF7b among total mutations.

DISCUSSION

Research on the variation in the SARS-CoV-2 genome sequence is necessary for the examination of disease course of COVID-19, disease progression, monitoring, controlling and treatment of SARS-CoV-2 infection. In this present study, the genome sequences of MERS-CoV-2 isolates were examined. The impact of epitope deletion among non synonymous mutations was the aim of this study which is related to immune escape and pathogenesis. Our study showed that according to the pooled prevalence (95% confidence interval) of mutations, the S variation was shown high frequency 16.4% (13.6, 16.6) among non-synonymous and NSP was the most common mutant among total mutation 31. 6% (21,44.6).


The high mutation of RNA viruses causes genetic variation, virus evolution and it is a strategy to escape the immune system and drug resistance. The SARS-CoV-2 complete genomes with different geographical locations are essential for detecting the genetic variations in the virus that causes viral shedding. Several genome variations in the SARS-CoV-2, such as nuclei capsid N protein, ORF4a and the surface protein S associated with the host immune system. Research indicates that genetic variations of SARS-COV-2 can transmit during the early stage of the epidemic; however, genomes are remarkably stable, and they are not able to evolve rapidly.


It is demonstrated that the fatality rate of COVID-19 can vary in different populations, and the level of virulence varies among humans. A larger number of specific mutations with a rapid transmission are detected in Italy, Spain and US and it is related to critical conditions. However, it is demonstrated that genome sequences of SARS-CoV-2 are similar with only a few mutations, but some countries such as North America and Europe are shown the heavily affected regions and Australia, Asia and Africa less affected with sequence variation. Research shows that the variation of RNA viruses is pivotal during an outbreak and it depends on nucleotide substitutions. Based on the viral transmission, the viral mutation rates vary in different viruses and help the virus in host adaptation.

Finding non-synonymous mutations through the database is useful for identifying mutations and their modes of transmission. There are some new variants such as (deletion 69-70, deletion 144, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H) are defined in the spike protein of SARS-CoV-2. The novel mutation, (N501Y) which is found in the UK virus variant is located in the Receptor Binding Domain (RBD). The severity and infectious diseases of the UK variant remain unknown. In SARS-CoV-2 viruses, D614G, is a common mutation spike protein around the world. Also, it is proposed that the highest frequency of spike D614G mutation (S) may be associated with higher viral loads, cellular infectivity, infection severity and lethal outcome in COVID-19. The relation between high viral loads in the upper respiratory tracts and G clade is measured by RT PCR. It is suggested that the sensitivity of the G variant of SARS-CoV-2 spike to neutralizing antibody is more sensitive than D variant. It is reported that D to G mutation at position 614 (D614G) in the spike glycoprotein which is originated from Europe or China is a significant variation in changes of the secondary structure of protein. D614G mutation started in all affected regions such as Bangladesh (with 95.6% D614G mutation), Italy, Spain, North America and European countries, amino acid substitution 1109 (F→L) and 76th (S_T76I) position at spike protein found in Bangladeshi and Indonesian strain respectively. It is also suggested that mutation in RNA dependent RNA polymerase (RdRp) and D614G increase SARS-CoV-2 transmission and promote the infectivity of SARS-CoV-2. The study of 12,300 SARS-CoV-2 genome sequences from different countries reported that D614G and P4715L variation was associated with higher COVID-19 mortality.

It is evident that ORF1ab P4715L (nsp 12) plays a pivotal role in viral replication and it is reported that ORF1ab-V378I mutation is associated with COVID-19 infection in Taiwan, Australia and Germany. Also, three mutations, including (M5865V, S5932F) and (R203K) described in ORF1ab and N respectively. It is noticed that mutation in Nuclei capsid (N protein) (R203K and G204R) observed in Italy, Spain, India and France and also N_S202N mutant was detected in Saudi Arabia.

CONCLUSION

Our study shows that substitution in S protein (D614G) is the dominant variant in Asia, Oceania, Europe and North America mutant, Italy, Morocco and Saudi Arabia and led to severe respiratory infections and death in these regions. Genomic mutation analysis of SARS-CoV-2 strains may provide knowledge about different biological infrequent mutations and their relationships of viral transmission, pathogenicity, infectivity, and fatality rates between SARS-CoV-2 and human cells.

Conflict of interests

The authors declare there is no conflict of interest.

References