Discovering stories in complex data requires innovative visualizations


Published March 25, 2019 by Martin Rosvall

Rich, relational data of who did what when is a goldmine for understanding a system with many connected things. However, it takes multiple steps of processing, refining, and massaging the data to reach the desired understanding. Even the best clustering algorithms that can identify significant structure do not take you all the way. I learned this as a postdoctoral researcher about ten years ago when I was analyzing the output of a network clustering algorithm that I was developing.

The input was a network with more than six million citations between about six thousand scientific journals and the output a list with 90 clusters, each representing a scientific area. Did the results make sense? I searched up and down the list for journals that I assumed should be clustered together. It took hours, and I did not learn anything new since I could only confirm what I already knew. Moreover, the cluster list lacked essential information about how the clusters were related to each other. I needed a visualization that highlighted scientific areas and their relationships like road maps depict cities connected by highways.

Because there was no tool available for creating such a map, I started with whiteboard sketches and simple scripts for testing different ideas. Step-by-step and with feedback from collaborators, I made my first map of science (see figure above). It depicts scientific areas with circles and their relationships with bidirectional arrows. Their sizes indicate importance. However, when we showed the maps to colleagues, they complained: "Something is wrong with the algorithm, chemistry is too small." Yes indeed, they are chemists. In any case, I went back and checked the clustering algorithm and the map script. No change. Then I went one step further and replaced the input with ten years older citation data. Bingo! We showed the alternative map to the chemists, and they were happy: "Now it looks good! What was wrong in the algorithm?" The algorithm was not wrong, their perception of science was. It was outdated.

This experience taught me two things. First, complex data require powerful visualizations to comprehend and communicate the results. Second, rich data are dynamic and change over time. We needed an efficient visualization to capture that change. With no one available, we set out to create one. We call them alluvial diagrams because they look like alluvial fans of deposit built up by streams. With the alluvial diagrams, we have discovered, for example, how neuroscience emerged as a standalone scientific area mainly from cell biology, neurology, and psychology (see figure below) and how dramatic changes in lending patterns occurred after the Federal Reserve began paying interest on reserve balances during the financial crisis in 2008.

Alluvial diagram

Because we, as well as other researchers, needed these visualizations over and over again to discover stories in complex data, we built interactive tools to transform days of scripting into minutes of customization. They are available on mapequation.org for anyone to use.

At Infobaleen, our customers have the same desire as researchers to discover stories in their data. However, because they all work with some type of transaction data, we can further streamline the visualization tools and eliminate the need for customization. As an example with open data, this interactive map of movies listed on Wikipedia makes it easy to explore and discover movies related to your favorite ones by zooming and dragging. In this case, the transactions come from editors editing movie pages, but with customer transactions, you can find groups of customers with similar and nonoverlapping interests and use them in targeted campaigns.

Whether you are a researcher or a sales manager, we want to empower you with the best algorithms and visualizations for discovering stories in your data.

Add your comments to the LinkedIn post.