In the months since SARS-CoV-2 emerged and began spreading, researchers have been attempting to collect new data to understand the virus and COVID-19 disease it causes.
But they have also been digging into sequence and gene expression data that was originally generated for other purposes to go beyond what is known about the virus and try to fill in some of the many remaining unknowns.
In particular, several teams started trying to untangle SARS-CoV-2 infection susceptibility and symptoms by studying ACE2, a gene that codes for the angiotensin-converting enzyme 2. That cell surface protein, which has been studied in the context of blood pressure regulation, heart disease, and other conditions, is well known for acting as a receptor for the original SARS coronavirus (known as SARS-CoV or SARS-CoV-1).
In 2003, for example, investigators at the Brigham and Women's Hospital and other centers in Massachusetts demonstrated that the "S1" domain of the SARS-CoV spike protein binds to ACE2, using the enzyme as a receptor as it wrangles its way into human cells.
In a Nature paper published at the time, the researchers noted that "a soluble form of ACE2, but not the related enzyme ACE1, blocked association of the S1 domain with Vero E6 cells," while antibodies targeting ACE2 interfered with SARS-CoV replication in the Vero E6 African green monkey kidney cell line.
From these and other experiments in human cell lines, the authors concluded that the ACE2 enzyme "is a functional receptor for SARS-CoV." In contrast, subsequent studies on the coronavirus behind Middle Eastern Respiratory Syndrome, or MERS, suggest it uses a dipeptidyl peptidase 4 enzyme encoded by a gene called DPP4 rather than ACE2 to enter human cells.
Interest in ACE2 was renewed earlier this year, when independent teams led by investigators at the Chinese Center for Disease Control and Prevention, the University of Minnesota, the National Institute of Allergy and Infectious Diseases, the University of Gottingen, and other centers reported that the protein appears to play a similar receptor role for SARS-CoV-2, the COVID-19-causing coronavirus that emerged late last year and has since spread around the world.
Now, though, the research community is armed with a large ¡ª and growing ¡ª collection of genomic data from healthy control individuals and population study participants. And that means that several teams have been busy interrogating ACE2 variation, its potential impact on SARS-CoV-2 interactions, and the receptor's tissue-specific expression in data from thousands or hundreds of thousands of individuals, even as clinical samples from COVID-19 patients are still being collected.
Much of the research done so far has been shared in preprint form, meaning the specific findings of these studies may change somewhat as the papers go through peer review. Even so, the strategies used offer a glimpse at the kinds of information that can be gleaned from available ACE2 sequence and gene expression data as investigators prepare to analyze samples obtained during the ongoing COVID-19 pandemic.
A taste for ACE2
"It seems like SARS-CoV-2, and this family of coronavirus in particular that jumps into humans, seems to like the ACE2 receptor. We had this in 2003 and again in 2019/20," noted Somasekar Seshagiri, co-founder and chief scientific officer at the California-based drug development company ModMab Therapeutics, who is also president of science and education with India's SciGenom Research Foundation and previously worked as a staff scientist in molecular biology at Genentech.
Together with investigators from the California-based genetic diagnostics and drug discovery research company MedGenome, the University of California at San Francisco, and elsewhere, Seshagiri recently tapped into available whole-genome, exome, and other targeted sequence datasets representing nearly 300,000 individuals to explore the genetic variability in ACE2.
Using structural and synthetic interaction data, along with results from a preliminary site-directed mutagenesis analysis of ACE2 interface interactions from University of Illinois biochemistry researcher Erik Procko, they not only identified polymorphisms expected to have an impact on the function of the resulting enzyme, but also grouped these variants based on their predicted effects on interactions with the SARS-CoV-2 spike protein.
"We were excited to take a look at some of these polymorphisms and ask questions at the molecular level," explained co-author Natalia Jura, a cardiovascular, cellular and molecular pharmacology researcher at UCSF. "We could look at our polymorphisms and categorize them into two groups: the ones that we are quite confident to say would enhance the interaction between the virus and the host and the ones that are predicted to disrupt."
The ACE2 variation variable
The team's results, posted as a preprint in BioRxiv earlier this month, proposed a handful of ACE2 variants suspected of boosting SARS-CoV-2 binding and, potentially, host susceptibility, along with several variants predicted to dial down ACE2 interactions with the viral spike protein that may be protective.
"What we can conclude is that this new virus has evolved new modality to interact with the ACE2 receptor," Jura noted. "Unfortunately, it seems like there are polymorphisms in the human population that will make some individuals more susceptible to binding this virus because these mutations are enhancing this unique part of the interface."
Seshagiri noted that such insights might make it possible to design potential therapeutic versions of ACE2 that are particularly adept at binding coronavirus spike proteins, thereby preventing the viruses from interacting with an individual's own ACE2 receptors, for example.
In a recent Cell preprint (pdf), a team from Sweden, Spain, Austria, and Canada proposed its own strategy for engineering soluble, clinical-grade forms of the human ACE2 protein that appeared to dial down early-stage infections by SARS-CoV-2 in otherwise susceptible cell types.
"We are not the first to come up with the idea of saying ACE2 could be a therapeutic," he said, though he suggested that engineering soluble forms of the receptors protein that bind well to SARS-CoV-2 may serve as a strategy for "future proofing" against the emergence of these and other related viruses down the road.
The researchers plan to profile ACE2 polymorphisms in still more human samples for the final version of the study, which will likely be submitted for peer review in the coming weeks, Seshagiri said.
He and MedGenome CEO Rayman Mathoda noted that the diagnostic company, which is active in India and other emerging markets, is also a founding member of a GenomeAsia 100K project.
"We've made a very intentional effort to build on a data-focused set of efforts, where we take our proprietary data as we grow, but build in other data source," Mathoda said.
The investigators are not alone in attempting to establish a baseline understanding of ACE2 variation across and within populations.
At the University of Siena in northern Italy, Alessandra Renieri and her colleagues have been delving into ACE2 genetic variation using available exome sequences for some 7,000 healthy participants in the Network of Italian Genomes project. As they reported in a preprint posted to MedRxiv in early April, the investigators saw significant variation in ACE2 in that retrospective dataset, including both common and rare, missense variants predicted to influence the protein's stability and its interactions with the coronavirus viral spike.
"There is pretty wide genetic variability," Renieri said. "There are both polymorphisms, so variants found in a percentage of the population, and there are also rare variants ¡ª a lot of rare variants."
It may be possible for the individual centers participating in the Network of Italian Genomes to recontact individuals in the future to try to find out who became infected with SARS-CoV-2 and to assess ACE2 variation alongside clinical outcomes, Renieri noted, though she cautioned that "ACE2 is just one of the many genes that could be involved."
For the reCOVID project, members of the team are seeking funding through the European Commission's Innovative Medicines Initiative IMI2 call for proposals to do functional analyses on ACE2 and other genes, for example, in the hopes of developing candidate therapeutics.
Renieri is also part of a team that been working since mid-March to prospectively collect samples from 2,000 COVID-19 patients at least 21 different hospitals in Italy as part of the GEN-COVID study, part of the COVID-19 Host Genetics Initiative.
For that project, researchers in Italy will use whole-exome sequencing to assess patient samples collected in conjunction with very detailed clinical information, she explained, while collaborators in Finland will genotype the samples for a related genome-wide association study.
From expression to candidate cell types
Along with variants falling within protein-coding portions of the gene, there is interest in identifying non-coding variants that impact expression of ACE2 and related genes in various human tissue types, along with potential differences in expression between men and women. Conversely, ACE2 expression in specific cell types may offer a guide to tissues that may be infiltrated by SARS-CoV-2.
In a 2005 study published in the Journal of Virology, for example, researchers from the University of Iowa and Harvard University drew a link between SARS-CoV infection of cultured primary human airway epithelial cells and ACE2 expression, which appeared to notch up in more differentiated forms of the cells.
A Shanghai Jiao Tong University-led team considered some 1,700 ACE2 variants from genome databases for a correspondence article submitted to Cell Discovery in late February, uncovering apparent allele frequency differences in individuals from Han Chinese, African, European, and other populations.
In an effort to get a more refined view of ACE2 expression in specific cell types from more than two-dozen human tissues, researchers at the software company Nference and Janssen Research and Development brought together single-cell gene expression data from thousands of published and publicly available single-cell RNS sequencing dataset in a freely-available portal called NferX.
As noted in a preprint posted to BioRxiv at the end of March, the team highlighted several specific cell types that express ACE2, including enhanced ACE2 expression in human or rat model cells from the nasal olfactory epithelium, tongue keratinocytes, and mature enterocytes in the colon and small intestine.
The results are intriguing, explained Venky Soundararajan, founder and chief scientific officer at Nference, though he added that more work is needed understand whether the SARS-CoV-2 actually infects these cell types.
"Just because ACE2 is present does not mean the virus infects the cell," he explained.
Even so, Soundararajan noted that possible SARS-CoV-2 infiltration and infection of such nasal, tongue, and intestinal cell types could potentially account for some of the less common COVID-19 symptoms that have been proposed ¡ª from individuals who described losing their sense of smell or taste to cases involving diarrhea.
The latter findings are particularly intriguing since some investigators have expressed concern over possible fecal-to-oral transmission of SARS-CoV-2, while other teams have been testing wastewater samples in specific cities to get a glimpse at the extent of COVID-19 cases that exist there.
Soundararajan cautioned that there are still gaps in the single-cell RNA-seq data used to build NferX, particularly when it comes to having high-quality single-cell expression data across all of the cell types and samples from a broader range of healthy individuals or individuals with COVID-19.
"I don't think we have enough single-cell sequencing data from the public domain from a volume standpoint," Soundararajan suggested. "We certainly also have many tissues that have not gone through single-cell sequencing yet, and that is another gap that needs to be filled, even for normal tissue."
He and his collaborators are currently working on pairing single-cell transcriptomic clues with electronic health record and clinical data to better understand the basis of different SARS-CoV-2 infection susceptibility and outcome patterns. The Nference team is also interested in providing investigators with the tools and systems biology approaches to better understand other SARS-CoV-2 targets in human cells.
"ACE2 is only the beginning of the story, because ACE2 is the very first interaction between the virus at the point of entry," Soundararajan said. "There's so much we don't know about what happens beyond that point."