Big Data and Machine Learning Take on HIV

Ryan Black
FEBRUARY 06, 2018

(Graphic courtesy Hong Kong University of Science and Technology)

The human immunodeficiency virus (HIV) continues to vex medicine. A vaccine still isn’t in sight, though a lot of progress has been made in controlling the disease by harnessing broadly neutralizing antibodies (bnAbs) that can mute many of its variations.

But HIV often endures, wriggling past bnAbs by mutating. To build a better bnAb, or create the right combination of them, researchers must find ways to counter those mutations. For that, a team from Hong Kong University of Science and Technology (HKUST), alongside collaborators from the Massachusetts Institute of Technology (MIT), have turned to big data.

Using data from 20,000 sequences drawn from nearly 2,000 HIV-positive patients, they were looking to map out the virus’s “spike,” or the protein protrusions on the surface of its molecules that bnAbs are designed to target. The researchers sought “An accurate representation of viral fitness as a function of its protein sequences (a fitness landscape), with explicit accounting of the effects of coupling between mutations.”

Matthew McKay, a computer and biological engineer at HKUST, called the project multi-disciplinary: A computer science study with biological implications. The team went to work building machine learning algorithms to predict the fitness landscape of the envelope protein—gp160, commonly called Env—that constitute the spike. By understanding the spike’s fitness, scientists may be able to target it and compromise its ability to mutate successfully.

To predict the fitness landscape, however, the team needed to estimate well over 4 million parameters.

"Without big data machine learning methods, it is simply impossible to make such a prediction," Raymond Louie of HKUST's Department of Electronic & Computer Engineering said.

According to the study, the predictions produced by the computational method “Compare very well with published experimental data, including intrinsic fitness measurements, protein contacts in the SOSIP trimer, and known escape mutations that arise to evade antibodies.”

"The findings can assist biologists in proposing new immunogens and vaccination protocols that seek to force the virus to mutate to unfit states in order to evade immune responses, which is likely to thwart or limit viral infection," McKay said, describing the machine learning methodology as “fast and accurate.”

Although it was particularly aimed at gp160, the processes developed in the study were general enough to be applied to other entropy inference problems, according to the authors. Their report, “Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies,” was published recently in the Proceedings of the National Academy of Sciences

Become a contributor