专访剑桥大学新冠病毒变种报告第一作者:没有证据表明新冠病毒起源于武汉

lindamy

时代广场舞照跳
VIP
注册
2005-11-23
消息
30,321
荣誉分数
7,459
声望点数
373
专访剑桥大学新冠病毒变种报告第一作者:没有证据表明新冠病毒起源于武汉

两极反向

昨天 10:38


4月8日,国际知名的学术期刊《美国科学院院报》(PNAS)发表文章“Phylogenetic network analysis of SARS-CoV-2 genomes”。

论文由来自德国和英国的研究团队共同撰写,第一作者为英国剑桥大学的彼得·福斯特(Peter Forster)博士。

1586839579947.png


日前,CGTN通过视频连线的方式采访了彼得·福斯特(Peter Forster)博士,就研究内容进行了解答。

福斯特介绍,此次研究目的是为了确定“原始病毒类型”。

因为有太多的快速突变,传统手段很难清晰地追踪COVID-19家族树,研究人员专门使用了一种“数学网络算法”技术。此前,该技术主要用于分析DNA以绘制史前人类种群活动图。这是其第一次被用来追踪冠状病毒的感染途径。

研究人员分析了自2019年12月24日至2020年3月4日期间从世界各地采集的160个新冠病毒(SARS-Cov-2)基因组的数据,发现了三个主要SARS-Cov-2变体,并根据氨基酸变化不同将其命名为A、B和C型。

其中A型病毒与蝙蝠及穿山甲体内发现的冠状病毒最为接近,为原始病毒类型,B型衍生自A型,C型衍生自B型。此外,三类变体在全球的分布范围不同,差异极大。A和C型多发现于欧洲人和美国人中,B型是东亚最常见的类型。

福斯特说,在武汉疫情明显时首先被发现的一个基因组是B型病毒。研究人员当时误以为B型是原始病毒,但事实并非如此,A型才是原始病毒,当时在武汉只是少数,不过B型之后成了武汉疫情暴发期间的主要病毒类,并且进一步突变为C型。

研究发现,感染A型的样本将近一半来自东亚以外地区,主要位于美国和澳大利亚,且三分之二美国样本感染的是A型。此外,A型虽然最早出现在武汉,但武汉只有极少的感染病例。有些曾在武汉生活过的美国人被发现携带A型病毒基因组。而B型主要分布于中国及东亚地区,亚洲以外的B型基因组都发生了突变。

C型是在欧洲传播的主要病毒类型,在美国和巴西也都有发现;但在中国大陆的感染样本中未被发现,在中国香港、中国台湾、新加坡和韩国皆有分布。

福斯特表示,研究表明新冠肺炎的首例感染病例可能是由蝙蝠传到人,并发生在2019年9月13日到12月7日之间。因此2019年12月24日从武汉采样的病毒基因组根本不能准确地告诉我们疾病的起源。

===================================================
CGTN通过视频连线的方式采访了彼得·福斯特(Peter Forster)博士的视频在网址内,感兴趣的朋友自己去看,我不会转发视频。


 


OVID-19 found in Wuhan was not original form of virus
165 views •Apr 13, 2020

A scientific study undertaken by a team of experts at the University of Cambridge has revealed that an initial genome sample of COVID-19 taken in Wuhan on Dec. 24 was not the original form of the virus but a mutated version, leading geneticist Peter Forster said on Sunday.

Forster, led his team in tracing the origins of the COVID-19 epidemic by analyzing 160 genomes from human patients and found that the strain in Wuhan had in fact mutated from an earlier version.

The findings of Foster’s study were recently published in a paper in the Proceedings of the National Academy of Sciences (PNAS) scientific journal in the U.S. which mapped out the early mutations of the deadly coronavirus.

In an exclusive interview with the China Global Television Network (CGTN) on Sunday. Forster detailed the purpose of the extensive study, and outlined some of the major discoveries

"What I wanted to do in this research, together with my colleagues, is to identify the original viral genome -- the original viral genome type -- because the virus mutates, it changes, and you get variants arising -- and which is the original one, because all these mutations that have happened without anybody realizing the disease is among us. The first genome that we have is from Christmas Eve in 2019, which is December 24, obviously, and what came before that we don't know. That is the aim of the paper," said Forster.

Forster's paper creatively applied "phylogenetic network analysis", which he usually uses to analyze human origins, into finding the origin of the virus which has now swept the world.

After mapping out 160 genetic samples from across the world, Foster found three distinct types of SARS-CoV-2, the virus that causes COVID-19, and concluded that the most common varient found in Wuhan was not the original form of the virus.

"What we have reconstructed is the network of possible trees how it evolved. We've got all the realistic trees included in this network at one glance, which is what other methods can't do. And then we apply what we called an outgroup, that means an independent non-human virus which tells us inside this cluster of trees which is the oldest virus genome. And as an outgroup, you need something which is non-human, so what do you take? You take the bat coronavirus because that's very closely related to us. And if you apply that, you find out a location in the network, which we call Type A, is the original type that would have infected humans. Then it mutated and changed into Type B. This Type B was then the first genome to be picked up in Wuhan when the disease became apparent. Researchers might be forgiven for thinking at the time that B is the original type, but actually it's not – it's Type A, which in Wuhan is only a minority type, but B has become the majority type during the outbreak. That has mutated further into C," said Forster.

He added that now the C-type is not found in the early phase of the outbreak in Chinese mainland, it is found outside; for example, it is well represented in Singapore.

Forster also stated that it is scientifically incorrect to assume China is the original source of COVID-19 just because the first published genome sequence of the virus was uploaded from the country.

"My research started when I realized in early February that the outbreak was a serious matter, not simply like a flu epidemic, and that I needed to start right away with my colleagues to understand how the virus was evolving before it really spreads across the world. So that's the start of the research. What is now important to consider is that the earliest genome which has been placed into the database is not necessarily the origin of the disease. If I had sampled someone from Scotland and put them [the genomes] in the database first, then obviously, it would look as if Scotland was the origin. That is not a valid approach. I am saying this because there are people who do take this approach, but that's not the way to do it," he said.

According to calculations made by Forster and his team, the first human infection of the virus probably took place between September 13 and December 7 last year.

The first known novel coronavirus sample was taken on December 24, 2019, which was more than two weeks later than the latest possible date of the first infection if the findings are correct, said Forster.

"The first sample was collected in Wuhan by Chinese researchers on the 24th of December 2019, so Christmas Eve 2019. And that first sample, that genome sequence, is available for study by international researchers. Since then, many hundred, many thousand other sequences have been contributed from across the world. Because we have now in our study in PNA.
 
病毒有两个星期的潜伏期,两个星期前在哪,真的说不清楚。
 
现在看,这鬼病毒的潜伏期可是远不止两个星期。
 
Phylogenetic network analysis of SARS-CoV-2 genomes

Peter Forster, Lucy Forster, Colin Renfrew, and View ORCID ProfileMichael Forster

PNAS first published April 8, 2020 Phylogenetic network analysis of SARS-CoV-2 genomes

  1. Contributed by Colin Renfrew, March 30, 2020 (sent for review March 17, 2020; reviewed by Toomas Kivisild and Carol Stocking)
Significance

This is a phylogenetic network of SARS-CoV-2 genomes sampled from across the world. These genomes are closely related and under evolutionary selection in their human hosts, sometimes with parallel evolution events, that is, the same virus mutation emerges in two different human hosts. This makes character-based phylogenetic networks the method of choice for reconstructing their evolutionary paths and their ancestral genome in the human host. The network method has been used in around 10,000 phylogenetic studies of diverse organisms, and is mostly known for reconstructing the prehistoric population movements of humans and for ecological studies, but is less commonly employed in the field of virology.

Abstract

In a phylogenetic network analysis of 160 complete human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) genomes, we find three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia. The network faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented COVID-19 infection sources, which can then be quarantined to prevent recurrent spread of the disease worldwide.


The search for human origins seemed to take a step forward with the publication of the global human mitochondrial DNA tree (1). It soon turned out, however, that the tree-building method did not facilitate an unambiguous interpretation of the data. This motivated the development, in the early 1990s, of phylogenetic network methods which are capable of enabling the visualization of a multitude of optimal trees (2, 3). This network approach, based on mitochondrial and Y chromosomal data, allowed us to reconstruct the prehistoric population movements which colonized the planet (4, 5). The phylogenetic network approach from 2003 onward then found application in the reconstruction of language prehistory (6). It is now timely to apply the phylogenetic network approach to virological data to explore how this method can contribute to an understanding of coronavirus evolution.

In early March 2020, the GISAID database (GISAID - Initiative) contained a compilation of 253 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) complete and partial genomes contributed by clinicians and researchers from across the world since December 2019. To understand the evolution of this virus within humans, and to assist in tracing infection pathways and designing preventive strategies, we here present a phylogenetic network of 160 largely complete SARS-Cov-2 genomes (Fig. 1).



Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (https://www.fluxus-engineering.com/), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files.
" data-icon-position="" data-hide-link-title="0" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; background-color: transparent; font-weight: normal; text-decoration: none; outline: 0px !important; color: rgb(0, 90, 150); display: block; border: 0px; box-shadow: rgba(0, 0, 0, 0.15) 0px 2px 10px
0px;">
Fig. 1.

Fig. 1.

Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (fluxus-engineering.com), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files.

Zhou et al. (7) recently reported a closely related bat coronavirus, with 96.2% sequence similarity to the human virus. We use this bat virus as an outgroup, resulting in the root of the network being placed in a cluster of lineages which we have labeled “A.” Overall, the network, as expected in an ongoing outbreak, shows ancestral viral genomes existing alongside their newly mutated daughter genomes.

There are two subclusters of A which are distinguished by the synonymous mutation T29095C. In the T-allele subcluster, four Chinese individuals (from the southern coastal Chinese province of Guangdong) carry the ancestral genome, while three Japanese and two American patients differ from it by a number of mutations. These American patients are reported to have had a history of residence in the presumed source of the outbreak in Wuhan. The C-allele subcluster sports relatively long mutational branches and includes five individuals from Wuhan, two of which are represented in the ancestral node, and eight other East Asians from China and adjacent countries. It is noteworthy that nearly half (15/33) of the types in this subcluster, however, are found outside East Asia, mainly in the United States and Australia.

Two derived network nodes are striking in terms of the number of individuals included in the nodal type and in mutational branches radiating from these nodes. We have labeled these phylogenetic clusters B and C.

For type B, all but 19 of the 93 type B genomes were sampled in Wuhan (n = 22), in other parts of eastern China (n = 31), and, sporadically, in adjacent Asian countries (n = 21). Outside of East Asia, 10 B-types were found in viral genomes from the United States and Canada, one in Mexico, four in France, two in Germany, and one each in Italy and Australia. Node B is derived from A by two mutations: the synonymous mutation T8782C and the nonsynonymous mutation C28144T changing a leucine to a serine. Cluster B is striking with regard to mutational branch lengths: While the ancestral B type is monopolized (26/26 genomes) by East Asians, every single (19/19) B-type genome outside of Asia has evolved mutations. This phenomenon does not appear to be due to the month-long time lag and concomitant mutation rate acting on the viral genome before it spread outside of China (Dataset S1, Supplementary Table 2). A complex founder scenario is one possibility, and a different explanation worth considering is that the ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia.

Type C differs from its parent type B by the nonsynonymous mutation G26144T which changes a glycine to a valine. In the dataset, this is the major European type (n = 11), with representatives in France, Italy, Sweden, and England, and in California and Brazil. It is absent in the mainland Chinese sample, but evident in Singapore (n = 5) and also found in Hong Kong, Taiwan, and South Korea.

One practical application of the phylogenetic network is to reconstruct infection paths where they are unknown and pose a public health risk. The following cases where the infection history is well documented may serve as illustrations (SI Appendix). On 25 February 2020, the first Brazilian was reported to have been infected following a visit to Italy, and the network algorithm reflects this with a mutational link between an Italian and his Brazilian viral genome in cluster C (SI Appendix, Fig. S1). In another case, a man from Ontario had traveled from Wuhan in central China to Guangdong in southern China and then returned to Canada, where he fell ill and was conclusively diagnosed with coronavirus disease 2019 (COVID-19) on 27 January 2020. In the phylogenetic network (SI Appendix, Fig. S2), his virus genome branches from a reconstructed ancestral node, with derived virus variants in Foshan and Shenzhen (both in Guangdong province), in agreement with his travel history. His virus genome now coexists with those of other infected North Americans (one Canadian and two Californians) who evidently share a common viral genealogy. The case of the single Mexican viral genome in the network is a documented infection diagnosed on 28 February 2020 in a Mexican traveler to Italy. Not only does the network confirm the Italian origin of the Mexican virus (SI Appendix, Fig. S3), but it also implies that this Italian virus derives from the first documented German infection on 27 January 2020 in an employee working for the Webasto company in Munich, who, in turn, had contracted the infection from a Chinese colleague in Shanghai who had received a visit by her parents from Wuhan. This viral journey from Wuhan to Mexico, lasting a month, is documented by 10 mutations in the phylogenetic network.

This viral network is a snapshot of the early stages of an epidemic before the phylogeny becomes obscured by subsequent migration and mutation. The question may be asked whether the rooting of the viral evolution can be achieved at this early stage by using the oldest available sampled genome as a root. As SI Appendix, Fig. S4 shows, however, the first virus genome that was sampled on 24 December 2019 already is distant from the root type according to the bat coronavirus outgroup rooting.

The described core mutations have been confirmed by a variety of contributing laboratories and sequencing platforms and can be considered reliable. The phylogeographic patterns in the network are potentially affected by distinctive migratory histories, founder events, and sample size. Nevertheless, it would be prudent to consider the possibility that mutational variants might modulate the clinical presentation and spread of the disease. The phylogenetic classification provided here may be used to rule out or confirm such effects when evaluating clinical and epidemiological outcomes of SARS-CoV-2 infection, and when designing treatment and, eventually, vaccines.

Materials and Methods
The Global Initiative on Sharing Avian Influenza Data (GISAID) was founded in 2006, and, since 2010, has been hosted by the German Federal Ministry of Food, Agriculture and Consumer Protection. GISAID has also become a coronavirus repository since December 2019. As of 4 March 2020, the cutoff point for our phylogenetic analysis, the GISAID database (GISAID - Initiative) had compiled 254 coronavirus genomes, isolated from 244 humans, nine Chinese pangolins, and one bat Rhinolophus affinis (BatCoVRaTG13 from Yunnan Province, China). The sequences have been deposited by 82 laboratories listed in Dataset S1, Supplementary Table 1. Although SARS-CoV-2 is an RNA virus, the deposited sequences, by convention, are in DNA format. Our initial alignment confirmed an earlier report by Zhou et al. (7) that the pangolin coronavirus sequences are poorly conserved with respect to the human SARS-CoV-2 virus, while the bat coronavirus yielded a sequence similarity of 96.2% in our analysis, in agreement with the 96.2% published by Zhou et al. We discarded partial sequences, and used only the most complete genomes that we aligned to the full reference genome by Wu et al. (8) comprising 29,903 nucleotides. Finally, to ensure comparability, we truncated the flanks of all sequences to the consensus range 56 to 29,797, with nucleotide position numbering according to the Wuhan 1 reference sequence (8). The laboratory codes of the resulting 160 sequences and the bat coronavirus sequences are listed in Dataset S1, Supplementary Table 2 (Coronavirus Isolate Labels).

The 160 human coronavirus sequences comprised exactly 100 different types. We added to the data the bat coronavirus as an outgroup to determine the root within the phylogeny. Phylogenetic network analyses were performed with the Network 5011CS package, which includes, among other algorithms, the median joining network algorithm (3) and a Steiner tree algorithm to identify most-parsimonious trees within complex networks (9). We coded gaps of adjacent nucleotides as single deletion events (these deletions being rare, up to 24 nucleotides long, and mostly in the amino acid reading frame) and ran the data with the epsilon parameter set to zero, and performed an exploratory run by setting the epsilon parameter to 10. Both settings yielded a low-complexity network. The Steiner tree algorithm was then run on both networks and provided the identical result that the most-parsimonious trees within the network were of length 229 mutations. The structures of both networks were very similar, with the epsilon 10 setting providing an additional rectangle between the A and B clusters. The network output was annotated using the Network Publisher option to indicate geographic regions, sample collection times, and cluster nomenclature.

Data Availability.
The nucleotide sequences of the SARS-CoV-2 genomes used in this analysis are available, upon free registration, from the GISAID database (GISAID - Initiative). The Network5011 software package and coronavirus network files are available as shareware on the Fluxus Technology website (fluxus-engineering.com).

Acknowledgments
We gratefully acknowledge the authors and originating and submitting laboratories of the sequences from GISAID’s EpiFlu(TM) Database on which this research is based. We are grateful to Trevor Bedford (GISAID) for providing instructions and advice on the database. A table of the contributors is available in Dataset S1, Supplementary Table 1. We thank Arne Röhl for assessing the network.

Footnotes
  • 1To whom correspondence may be addressed. Email: pf223@cam.ac.uk or acr10@cam.ac.uk.
  • Author contributions: P.F. and M.F. performed research; P.F., L.F., and M.F. analyzed data; P.F. and M.F. performed statistical analyses; P.F., C.R., and M.F. wrote the paper; and C.R. wrote the Introduction.
  • Reviewers: T.K., Katholieke Universiteit Leuven; and C.S., University Medical Center Hamburg-Eppendorf.
  • The authors declare no competing interest.
  • This article contains supporting information online at Supporting Information.
  • Copyright © 2020 the Author(s). Published by PNAS.

 
最后编辑:
Phylogenetic network analysis of SARS-CoV-2 genomes

Peter Forster, Lucy Forster, Colin Renfrew, and View ORCID ProfileMichael Forster

PNAS first published April 8, 2020 Phylogenetic network analysis of SARS-CoV-2 genomes

  1. Contributed by Colin Renfrew, March 30, 2020 (sent for review March 17, 2020; reviewed by Toomas Kivisild and Carol Stocking)
Significance

This is a phylogenetic network of SARS-CoV-2 genomes sampled from across the world. These genomes are closely related and under evolutionary selection in their human hosts, sometimes with parallel evolution events, that is, the same virus mutation emerges in two different human hosts. This makes character-based phylogenetic networks the method of choice for reconstructing their evolutionary paths and their ancestral genome in the human host. The network method has been used in around 10,000 phylogenetic studies of diverse organisms, and is mostly known for reconstructing the prehistoric population movements of humans and for ecological studies, but is less commonly employed in the field of virology.

Abstract

In a phylogenetic network analysis of 160 complete human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) genomes, we find three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia. The network faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented COVID-19 infection sources, which can then be quarantined to prevent recurrent spread of the disease worldwide.


The search for human origins seemed to take a step forward with the publication of the global human mitochondrial DNA tree (1). It soon turned out, however, that the tree-building method did not facilitate an unambiguous interpretation of the data. This motivated the development, in the early 1990s, of phylogenetic network methods which are capable of enabling the visualization of a multitude of optimal trees (2, 3). This network approach, based on mitochondrial and Y chromosomal data, allowed us to reconstruct the prehistoric population movements which colonized the planet (4, 5). The phylogenetic network approach from 2003 onward then found application in the reconstruction of language prehistory (6). It is now timely to apply the phylogenetic network approach to virological data to explore how this method can contribute to an understanding of coronavirus evolution.

In early March 2020, the GISAID database (GISAID - Initiative) contained a compilation of 253 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) complete and partial genomes contributed by clinicians and researchers from across the world since December 2019. To understand the evolution of this virus within humans, and to assist in tracing infection pathways and designing preventive strategies, we here present a phylogenetic network of 160 largely complete SARS-Cov-2 genomes (Fig. 1).



Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (https://www.fluxus-engineering.com/), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files.
" data-icon-position="" data-hide-link-title="0" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; background-color: transparent; font-weight: normal; text-decoration: none; outline: 0px !important; color: rgb(0, 90, 150); display: block; border: 0px; box-shadow: rgba(0, 0, 0, 0.15) 0px 2px 10px
0px;">
Fig. 1.

Fig. 1.

Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (fluxus-engineering.com), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files.

Zhou et al. (7) recently reported a closely related bat coronavirus, with 96.2% sequence similarity to the human virus. We use this bat virus as an outgroup, resulting in the root of the network being placed in a cluster of lineages which we have labeled “A.” Overall, the network, as expected in an ongoing outbreak, shows ancestral viral genomes existing alongside their newly mutated daughter genomes.

There are two subclusters of A which are distinguished by the synonymous mutation T29095C. In the T-allele subcluster, four Chinese individuals (from the southern coastal Chinese province of Guangdong) carry the ancestral genome, while three Japanese and two American patients differ from it by a number of mutations. These American patients are reported to have had a history of residence in the presumed source of the outbreak in Wuhan. The C-allele subcluster sports relatively long mutational branches and includes five individuals from Wuhan, two of which are represented in the ancestral node, and eight other East Asians from China and adjacent countries. It is noteworthy that nearly half (15/33) of the types in this subcluster, however, are found outside East Asia, mainly in the United States and Australia.

Two derived network nodes are striking in terms of the number of individuals included in the nodal type and in mutational branches radiating from these nodes. We have labeled these phylogenetic clusters B and C.

For type B, all but 19 of the 93 type B genomes were sampled in Wuhan (n = 22), in other parts of eastern China (n = 31), and, sporadically, in adjacent Asian countries (n = 21). Outside of East Asia, 10 B-types were found in viral genomes from the United States and Canada, one in Mexico, four in France, two in Germany, and one each in Italy and Australia. Node B is derived from A by two mutations: the synonymous mutation T8782C and the nonsynonymous mutation C28144T changing a leucine to a serine. Cluster B is striking with regard to mutational branch lengths: While the ancestral B type is monopolized (26/26 genomes) by East Asians, every single (19/19) B-type genome outside of Asia has evolved mutations. This phenomenon does not appear to be due to the month-long time lag and concomitant mutation rate acting on the viral genome before it spread outside of China (Dataset S1, Supplementary Table 2). A complex founder scenario is one possibility, and a different explanation worth considering is that the ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia.

Type C differs from its parent type B by the nonsynonymous mutation G26144T which changes a glycine to a valine. In the dataset, this is the major European type (n = 11), with representatives in France, Italy, Sweden, and England, and in California and Brazil. It is absent in the mainland Chinese sample, but evident in Singapore (n = 5) and also found in Hong Kong, Taiwan, and South Korea.

One practical application of the phylogenetic network is to reconstruct infection paths where they are unknown and pose a public health risk. The following cases where the infection history is well documented may serve as illustrations (SI Appendix). On 25 February 2020, the first Brazilian was reported to have been infected following a visit to Italy, and the network algorithm reflects this with a mutational link between an Italian and his Brazilian viral genome in cluster C (SI Appendix, Fig. S1). In another case, a man from Ontario had traveled from Wuhan in central China to Guangdong in southern China and then returned to Canada, where he fell ill and was conclusively diagnosed with coronavirus disease 2019 (COVID-19) on 27 January 2020. In the phylogenetic network (SI Appendix, Fig. S2), his virus genome branches from a reconstructed ancestral node, with derived virus variants in Foshan and Shenzhen (both in Guangdong province), in agreement with his travel history. His virus genome now coexists with those of other infected North Americans (one Canadian and two Californians) who evidently share a common viral genealogy. The case of the single Mexican viral genome in the network is a documented infection diagnosed on 28 February 2020 in a Mexican traveler to Italy. Not only does the network confirm the Italian origin of the Mexican virus (SI Appendix, Fig. S3), but it also implies that this Italian virus derives from the first documented German infection on 27 January 2020 in an employee working for the Webasto company in Munich, who, in turn, had contracted the infection from a Chinese colleague in Shanghai who had received a visit by her parents from Wuhan. This viral journey from Wuhan to Mexico, lasting a month, is documented by 10 mutations in the phylogenetic network.

This viral network is a snapshot of the early stages of an epidemic before the phylogeny becomes obscured by subsequent migration and mutation. The question may be asked whether the rooting of the viral evolution can be achieved at this early stage by using the oldest available sampled genome as a root. As SI Appendix, Fig. S4 shows, however, the first virus genome that was sampled on 24 December 2019 already is distant from the root type according to the bat coronavirus outgroup rooting.

The described core mutations have been confirmed by a variety of contributing laboratories and sequencing platforms and can be considered reliable. The phylogeographic patterns in the network are potentially affected by distinctive migratory histories, founder events, and sample size. Nevertheless, it would be prudent to consider the possibility that mutational variants might modulate the clinical presentation and spread of the disease. The phylogenetic classification provided here may be used to rule out or confirm such effects when evaluating clinical and epidemiological outcomes of SARS-CoV-2 infection, and when designing treatment and, eventually, vaccines.

Materials and Methods
The Global Initiative on Sharing Avian Influenza Data (GISAID) was founded in 2006, and, since 2010, has been hosted by the German Federal Ministry of Food, Agriculture and Consumer Protection. GISAID has also become a coronavirus repository since December 2019. As of 4 March 2020, the cutoff point for our phylogenetic analysis, the GISAID database (GISAID - Initiative) had compiled 254 coronavirus genomes, isolated from 244 humans, nine Chinese pangolins, and one bat Rhinolophus affinis (BatCoVRaTG13 from Yunnan Province, China). The sequences have been deposited by 82 laboratories listed in Dataset S1, Supplementary Table 1. Although SARS-CoV-2 is an RNA virus, the deposited sequences, by convention, are in DNA format. Our initial alignment confirmed an earlier report by Zhou et al. (7) that the pangolin coronavirus sequences are poorly conserved with respect to the human SARS-CoV-2 virus, while the bat coronavirus yielded a sequence similarity of 96.2% in our analysis, in agreement with the 96.2% published by Zhou et al. We discarded partial sequences, and used only the most complete genomes that we aligned to the full reference genome by Wu et al. (8) comprising 29,903 nucleotides. Finally, to ensure comparability, we truncated the flanks of all sequences to the consensus range 56 to 29,797, with nucleotide position numbering according to the Wuhan 1 reference sequence (8). The laboratory codes of the resulting 160 sequences and the bat coronavirus sequences are listed in Dataset S1, Supplementary Table 2 (Coronavirus Isolate Labels).

The 160 human coronavirus sequences comprised exactly 100 different types. We added to the data the bat coronavirus as an outgroup to determine the root within the phylogeny. Phylogenetic network analyses were performed with the Network 5011CS package, which includes, among other algorithms, the median joining network algorithm (3) and a Steiner tree algorithm to identify most-parsimonious trees within complex networks (9). We coded gaps of adjacent nucleotides as single deletion events (these deletions being rare, up to 24 nucleotides long, and mostly in the amino acid reading frame) and ran the data with the epsilon parameter set to zero, and performed an exploratory run by setting the epsilon parameter to 10. Both settings yielded a low-complexity network. The Steiner tree algorithm was then run on both networks and provided the identical result that the most-parsimonious trees within the network were of length 229 mutations. The structures of both networks were very similar, with the epsilon 10 setting providing an additional rectangle between the A and B clusters. The network output was annotated using the Network Publisher option to indicate geographic regions, sample collection times, and cluster nomenclature.

Data Availability.
The nucleotide sequences of the SARS-CoV-2 genomes used in this analysis are available, upon free registration, from the GISAID database (GISAID - Initiative). The Network5011 software package and coronavirus network files are available as shareware on the Fluxus Technology website (fluxus-engineering.com).

Acknowledgments
We gratefully acknowledge the authors and originating and submitting laboratories of the sequences from GISAID’s EpiFlu(TM) Database on which this research is based. We are grateful to Trevor Bedford (GISAID) for providing instructions and advice on the database. A table of the contributors is available in Dataset S1, Supplementary Table 1. We thank Arne Röhl for assessing the network.

Footnotes
  • 1To whom correspondence may be addressed. Email: pf223@cam.ac.uk or acr10@cam.ac.uk.
  • Author contributions: P.F. and M.F. performed research; P.F., L.F., and M.F. analyzed data; P.F. and M.F. performed statistical analyses; P.F., C.R., and M.F. wrote the paper; and C.R. wrote the Introduction.
  • Reviewers: T.K., Katholieke Universiteit Leuven; and C.S., University Medical Center Hamburg-Eppendorf.
  • The authors declare no competing interest.
  • This article contains supporting information online at Supporting Information.
  • Copyright © 2020 the Author(s). Published by PNAS.



喜欢科学的态度来分析。等喷子来喷,有空会看一下。:zhichi:
 
不要把无知当有趣,你的引用的东西有参考价值么?你引用的相当随便哪个村民在这里写下: 有人说特朗普得过新冠病毒,一位伟大的病病毒专家经仔细分析了1亿个相关样本资料,发现特朗普100%得过新冠病毒。这有意义吗?至少你得把原始作者的发表文章给链接过来让大家看下,谁都不是傻子,村民自己会分析。
 
老奶奶,这么大岁数夜里也上班啊?能看懂生物学内容吗?我尊老爱幼,给老奶奶画个重点:
There are two subclusters of A which are distinguished by the synonymous mutation T29095C. In the T-allele subcluster, four Chinese individuals (from the southern coastal Chinese province of Guangdong) carry the ancestral genome, while three Japanese and two American patients differ from it by a number of mutations. These American patients are reported to have had a history of residence in the presumed source of the outbreak in Wuhan.

老奶奶,英国人不知道的是那四个在广东犯A病的人是从武汉来的。这个不是谣传,是比这篇文章早一个月的中国科学院云南植物所发的文献。这个英国人的文献没有比那篇文章多什么内容,是一家三口(都姓佛死特)混文章抄写的,在学界没什么意义,就是你老奶奶中国人以为找到宝贝可以甩锅了。
画的重点, 你老奶奶要i是看懂,就明白你夜里工作白干了。
你还红色重点啥的,楼主的文章主要是说A,B,C型,A型认为是最原始版本,而它主要在中国之外的地方(美国),B型从A型变来,主要在中国。
 
现在看,这鬼病毒的潜伏期可是远不止两个星期。

也可能有得过没症状/有症状,后来还带病毒,像慢性乙型肝炎一样带着病毒,因为了解还不够,什么都有可能。
 
CGTN大外宣的报道,又在断章取义、误导了。明明文章和采访说A类的 T和C型在武汉都发现,变成是Wuhan was not original form of virus武汉不是最初的病毒形式。
此文章样本采集至三月四日至,美国还没出现爆发。上传病例多是输入型的。而且发表的图上A类的中心是武汉,东亚和澳洲。
这个研究是利用GISAID早期上传的数据,只有160个样本,不足以说明起源问题。
 
专访剑桥大学新冠病毒变种报告第一作者:没有证据表明新冠病毒起源于武汉

两极反向

昨天 10:38


4月8日,国际知名的学术期刊《美国科学院院报》(PNAS)发表文章“Phylogenetic network analysis of SARS-CoV-2 genomes”。

论文由来自德国和英国的研究团队共同撰写,第一作者为英国剑桥大学的彼得·福斯特(Peter Forster)博士。

浏览附件894449

日前,CGTN通过视频连线的方式采访了彼得·福斯特(Peter Forster)博士,就研究内容进行了解答。

福斯特介绍,此次研究目的是为了确定“原始病毒类型”。

因为有太多的快速突变,传统手段很难清晰地追踪COVID-19家族树,研究人员专门使用了一种“数学网络算法”技术。此前,该技术主要用于分析DNA以绘制史前人类种群活动图。这是其第一次被用来追踪冠状病毒的感染途径。

研究人员分析了自2019年12月24日至2020年3月4日期间从世界各地采集的160个新冠病毒(SARS-Cov-2)基因组的数据,发现了三个主要SARS-Cov-2变体,并根据氨基酸变化不同将其命名为A、B和C型。

其中A型病毒与蝙蝠及穿山甲体内发现的冠状病毒最为接近,为原始病毒类型,B型衍生自A型,C型衍生自B型。此外,三类变体在全球的分布范围不同,差异极大。A和C型多发现于欧洲人和美国人中,B型是东亚最常见的类型。

福斯特说,在武汉疫情明显时首先被发现的一个基因组是B型病毒。研究人员当时误以为B型是原始病毒,但事实并非如此,A型才是原始病毒,当时在武汉只是少数,不过B型之后成了武汉疫情暴发期间的主要病毒类,并且进一步突变为C型。

研究发现,感染A型的样本将近一半来自东亚以外地区,主要位于美国和澳大利亚,且三分之二美国样本感染的是A型。此外,A型虽然最早出现在武汉,但武汉只有极少的感染病例。有些曾在武汉生活过的美国人被发现携带A型病毒基因组。而B型主要分布于中国及东亚地区,亚洲以外的B型基因组都发生了突变。

C型是在欧洲传播的主要病毒类型,在美国和巴西也都有发现;但在中国大陆的感染样本中未被发现,在中国香港、中国台湾、新加坡和韩国皆有分布。

福斯特表示,研究表明新冠肺炎的首例感染病例可能是由蝙蝠传到人,并发生在2019年9月13日到12月7日之间。因此2019年12月24日从武汉采样的病毒基因组根本不能准确地告诉我们疾病的起源。

===================================================
CGTN通过视频连线的方式采访了彼得·福斯特(Peter Forster)博士的视频在网址内,感兴趣的朋友自己去看,我不会转发视频。


CGTV是CCAV的海外版,现在大外宣的逼格降这么大吗,直接硬怼啊,以前都是通过纽时华尔街这样的嘴转述一下的。
 
CGTN大外宣的报道,又在断章取义、误导了。明明文章和采访说A类的 T和C型在武汉都发现,变成是Wuhan was not original form of virus武汉不是最初的病毒形式。
此文章样本采集至三月四日至,美国还没出现爆发。上传病例多是输入型的。而且发表的图上A类的中心是武汉,东亚和澳洲。
这个研究是利用GISAID早期上传的数据,只有160个样本,不足以说明起源问题。
原文如此:“The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans” “B type is the most common type in East Asia”
 
不知道“CGTN通过视频连线的方式”准确性有多高?没有证据,是不是没有找到证据?到底需要什么样的证据?

“CGTN通过视频连线的方式”确定病源在哪个地方呢?
 
不知道“CGTN通过视频连线的方式”准确性有多高?没有证据,是不是没有找到证据?到底需要什么样的证据?

“CGTN通过视频连线的方式”确定病源在哪个地方呢?
看帖上来的原文。人家是正规科学文章,介绍了方法,数据来源,等等。
 
后退
顶部