When mathematicians Leland McInnes and John Healy walked into their work’s annual “Big Dig” — a sort of classified hackathon for Canada’s version of the National Security Agency — in 2017, they were not thinking about biology at all. They wanted to find a way to quickly spot the differences between computer viruses.
They ended up creating a tool to simplify datasets and visualize the data points in them: an algorithm they named Uniform Manifold Approximation and Projection, or UMAP. They published a paper on it in 2018. To their great surprise, in fewer than five years, it has become one of the most ubiquitous tools in modern biology research. UMAP has now been used to study everything from forecasting rain in the Alps to identifying the many-hued pigments in a Gauguin artwork to modeling how Covid-19 tweets are disseminated. And, of course, scientists have applied UMAP to studying the actual virus itself. The technique is now the method of choice for most computational biologists who want to see what, exactly, is going on in a dataset.