CHum Dendrogram Viewer

The CHum Dendro Visualization originated out of earlier attempts to algorithmically find clusters of CHum articles. It became a visualization thanks to design work done by John Montague. The experiment’s data set includes five hundred articles chosen from the CHum journal due to their relevance to the conversation surrounding tool design and usage. This is the same data set used in the CHum word visualization.

This visualization required the more extensive preprocessing than the other visualization. The data was first processed by Mallet in order to create a topic model of each paper. These models were then used to calculate a ‘distance’ between each pair of articles. This distance matrix was then used by R to create a dendrogram of the paper clusters.

On the bottom row of the visualization there are exactly 500 one pixel columns each representing a paper; there is no significance in the order of these papers. The Y axis represents the relative distance between each paper. Articles that are close together cluster together closer to the bottom of the dendrogram, while articles that are far apart cluster closer to the top. The most distant articles only come together at the root node at the top of the visualization.

In addition to clustering, the visualization can also represent categorical data. The histogram at the bottom represents the distribution of papers over time, while the various colors represent categorical data assigned to each article.

Project Lead: Geoffrey Rockwell

Design: John Montague

Programming: Ryan Chartier