May 6 – 9, 2025
Abbaye de Royaumont, Asnières-sur-Oise, France
Europe/Paris timezone

GENETIC CLUSTERING BY COMMUNITY DETECTION: CAN WE RESOLVE HIV-1 TRANSMISSION RISK STRUCTURE WITHIN GIANT COMPONENTS?

Not scheduled
20m
Abbaye de Royaumont, Asnières-sur-Oise, France

Abbaye de Royaumont, Asnières-sur-Oise, France

Abbaye de Royaumont, 95270 Asnières-sur-Oise, France
Poster Transmission dynamics & clusters Virtual posters

Speaker

Paula Magbor (Western University)

Description

Clustering infections by genetic similarity is a common method of characterizing the risk structure of a population. A graph is constructed by connecting infections with genetic distances below a threshold, then connected components with two or more nodes are extracted as "clusters". Studies of HIV molecular epidemiology often use logistic regression to find associations between potential risk factors and the binary outcome of appearing in any cluster. This outcome is a crude proxy for transmission rate, and prevents us from resolving risk structure within large components. Our objective is to adapt community detection (CD) methods to broaden the standard definition of clusters.

We retrieved 12,560 HIV-1 pol sequences sampled in China and corresponding metadata (collection date, sex, age and risk factor) from GenBank. Sequences were aligned pairwise against the HXB2 reference using MAFFT, and the alignments were manually refined in Aliview. We used FastTree to reconstruct a phylogeny and FigTree to extract four major clades (subtypes and CRFs) for separate analyses. Pairwise distances were calculated using tn93, and networks were constructed at varying distance cutoffs. Finally, we used Python modules NetworkX and CDlib to extract connected components and apply different CD algorithms.

Different CD algorithms partitioned components for a given threshold by varying extents. For example, Louvain, SLPA and ANGEL reduced mean CRF07 cluster sizes (TN93<0.03) from 113.6 to 69.9, 91.2 and 20.7 sequences per cluster, respectively. Statistical associations were better resolved when large components were partitioned by CD into smaller clusters. For example, sex was independent of cluster ID for the five largest components in subtype B ($\chi^2$-test, $\textit{P}$=0.056), but was significantly associated with their Louvain partition ($\textit{P}$<10$^{-12}$). These preliminary findings underscore the utility of CD in characterizing risk structures by partitioning components into smaller communities.

Expedited Notification No thanks, I do not require Expedited Notification

Primary authors

Paula Magbor (Western University) Art Poon (Western University)

Presentation materials

There are no materials yet.