Customer segmentation by mapping networks for understanding and applications that generate value

Published February 2, 2019 by Martin Rosvall

Ever since Aristotle, organization and classification have been cornerstones of science for understanding the world. In network science, where we conduct most of our research, categorization of nodes into modules with so-called community-detection algorithms has proven indispensable to comprehending the structure of large interconnected systems.

With understanding comes powerful applications. For example, geographical maps both depict what we know about the world in the clearest way and aid navigation. Life in unfamiliar cities is an entirely different thing with Google Maps. That is why our vision is to build Google Maps for networks. Our approach is to develop and combine the best clustering algorithms and visualization tools.

Applied to e-commerce transaction data, where the purchasing network consists of customers and products, a good segmentation provides valuable understanding by predicting when and what products customers are most likely to buy next, which can be exploited in automated workflows and personalized communication through email, Facebook, and so forth. Like navigating unfamiliar cities with good maps, turning data into value with powerful tools lead to radically improved efficiency.

One of the clustering approaches that we are using is based on information theory, and hence we have named it Infomap. We have designed and developed the underlying mathematics and the algorithm for solving the clustering problem given by the mathematics and a specific set of relational data.

The underlying mathematics of Infomap identifies modules by compressing the modular description of a complete walk across the network. Applied to e-commerce transaction data, the walk corresponds to a succession of random steps between customers and products: From a customer the random walk continues to one of her purchased products selected at random and then to a customer who purchased the same product also selected at random, and so on to infinity, like an e-commerce analyst exploring the transaction network. If the transaction network has clusters of tightly interconnected customers and products, the walk will spend relatively long periods within those clusters. Using fundamental information theory, Infomap is designed to capitalize on these structures such that the description length is minimized when the clusters capture most structure in the underlying network.

Infomap's unique algorithm for solving the clustering problem consists of three components: the core algorithm, sub-module movements, and single-node movements. The core algorithm is a proven method for quickly achieving an approximate solution: Neighboring nodes are joined into clusters, which subsequently are joined into superclusters and so on. To improve the clustering accuracy, repeated and recursive runs of sub-module movements and single-node movements break the clusters into smaller components to enable fine tuning at different scales.

Investing in a powerful framework that enables straightforward mathematical generalizations and a continuously refined clustering algorithm have succeeded. Our approach has been widely heralded as one of the best of the many dozens of network-clustering algorithms used in thousands of scientific studies.

Academic glory is merely a means to an end. At Infobaleen, we want to leverage the power of Infomap and other clustering algorithms to help companies turn their data into understanding and applications that generate value.

Add your comments to the LinkedIn post.