Moderately Strong Confirmation of a Laboratory Origin of 2019-nCoV
James Lyons-Weiler, PhD 2-2-2020
Dr. Marc Wathelet commented that he was puzzled about my report of a spike protein gene homologous to part of the pShuttle-SN vector, given that spike glycoproteins are found in bat coronavirus. He urged me to analyze the homology (sequence similarity) of the SARS-like spike protein element I reported with other spike proteins, saying that any scientist working on coronviruses would be surprised if there were not a spike protein.
I replied in comment that I, too, would expect protein sequence level homology due to shared conserved domains, but assured him that I would undertake further genome sequence-level (nucleotide) analysis as the location of the novel sequence relative to the other spike proteins is certainly of interest.
A few recent publications (sent to me by followers/readers) contained further bat coronavirus accession numbers, and SARS accession numbers, so I procured the spike protein coding sequence (CDS) of these from NCBI’s nucleotide database and aligned them using Blast, with the sequence from the first 2019-nCoV protein as the anchor. (Oddly, that Genbank entry does not label the S protein CDS as a spike glycoprotein, instead annotating it only as a “structural protein”).
The resulting massive alignment confirms a major unique inserted element in 2019-nCoV not found in other bat coronaviruses, nor in SARS in the homologous genomic position:
This is why full genome phylogenetic trees cannot tell the full story of recombinant viral evolution.
Blasting the novel sequence region against all non-viral sequences (to pick up vector technology) again results in pShuttle-SN (no surprise) but now this time is also picked up a recombinant coronavirus clone Bat-SRBD spike glycoprotein gene from UNC, USA. (Genbank entry) and other synthetic constructs.
As I published earlier, before anyone points fingers at the Chinese, note that recombinant viruses have been in play in laboratories all across the world in many nations.
The overlap occur at the 3′ end of the novel region (search restricted from 21600-22350 bp in the query 2019-nCoV sequence originally blasted against the other coronavirus CDS. It could arguably merely be that I selected too large a region; I chose the region visually to include the fully potentially inserted sequence including any homologous vector elements at the 5′ or 3′ end.
It is worth pointing out that due to the length of overlap, the sequence strength is considered moderately strong: highly significant E-value, high %identity, but short sequence length. These findings cannot be considered strong validation for obvious reasons: produced by the same analyst, using (part) of the same data. Spike proteins determine receptor binding for entry into cells, and 2019-nCoV appears to, like some bat species SARS coronavirus, target ACE2 receptors [1].
For those tracking closely, I confirmed that the novel inserted sequence in the large alignment above is the same as the novel sequence I reported a few days ago. The sequence of interest is here.
[1] Hou et al., 2010. Angiotensin-converting enzyme 2 (ACE2) proteins of different bat species confer variable susceptibility to SARS-CoV entry Arch Virol 155:1563-1569
https://www.msi.umn.edu/~lifang/otherflpapers/bat-ace2-archivesofvirology-2010.pdf
These results do show, however that the novel sequence is not likely present in other coronaviruses.
Thus, it still seems prudent that this inserted sequence in 2019-nCoV become the focus on urgent research, and that laboratory sources be included in the search for the origins of 2019-nCoV and potential targets for treatments and expected pathophysiology in patients infected with 2019-nCoV.
I am grateful to Dr. Wathelet for this inquiries and requests for additional clarification.
Original source: https://jameslyonsweiler.com/2020/02/02/moderately-strong-confirmation-of-a-laboratory-origin-of-2019-ncov/