Researchers have developed a machine learning model that uses the assortment of microbes in a wastewater sample to estimate the number of people represented in that sample, according to a press release by Washington University in St. Louis.

Published in the journal *PLOS Computational Biology*, the method may be able to strengthen the ties between wastewater surveillance and individual-level public health data.

The number of people represented by a sample of wastewater is important for finding a correlation between SARS-CoV-2 and the number of people infected with COVID-19, for example.

“Usually when you design your experiment, you design your sample size, you know how many people you’re measuring,” says Fangqiong Ling, an assistant professor at Washington University in St. Louis and co-author of the paper. “If you just take one scoop of wastewater, you don’t know how many people you’re measuring.”

Initially, Ling thought that machine learning might be able to uncover a straightforward relationship between the diversity of microbes and the number of people it represented, but the simulations, done with an off-the-shelf machine learning, didn’t pan out.

The key to being able to tease out how many individual people were represented in a sample is related to the fact that, the bigger the sample, the more likely it is to resemble the mean, or average. But in reality, individuals tend not to be exactly average. Therefore, if a sample looks like an average sample of microbiota, it’s likely to be made up of many people. The farther away from the average, the more likely it is to represent an individual.

“But now we are dealing with high-dimensional data, right?” says Likai Chen, an assistant professor of mathematics and statistics and co-author. “There are near-endless number of ways that you can group these different microbes to form a sample. “So that means we have to find out, how do we aggregate that information across different locations?”

Using this basic intuition — and a lot of math — Chen worked with Ling to develop a more tailored machine learning algorithm that could, if trained on real samples of microbiota from more than 1,100 people, determine how many people were represented in any given wastewater sample.

“It’s much faster and it can be trained on a laptop,” Ling says.

And it’s not only useful for the microbiome, but also, with sufficient training data, this algorithm could use viruses from the human virome or metabolic chemicals to link individuals to wastewater samples.

“This method was used to test our ability to measure population size,” Ling says. But it goes much further. “Now we are developing a framework to allow validation across studies.”

#Machine #learning #estimates #number #people #wastewater #sample