Zoomology Zoomology home page link
graphics go here Go to Background Information Go to Problem Space Go to Methodology Go to application description Go to Future Directions Future Directions link Zoomology description link Methodology link Problem Space link Background Information link Link to GVIS Tree of Life link Link to InfoVis 2003 contest

 

 

BACKGROUND INFORMATION

One area of expansion in biology is within the classification of species. Traditionally, species were grouped together by anatomical similarities. Today, access to molecular-level data has given rise to new models based on proteins and DNA. New computer tools will be needed to help scientists create and contrast these new classification structures.

Although optimized for comparison of classification datasets, the solution presented here might apply to other large hierarchies with a moderate number of levels and relatively small areas of change.

Dataset Selection

Zoomology was created in response to the InfoVis 2003 Contest for Visualization and Pair Wise Comparison of Trees. Three problems were posed by the contest, including comparing small phylogenic trees by structure alone, comparing file system logs of about 70,000 nodes containing many variables, and comparing classification trees of approximately 200,000 nodes with three variables.

The classification trees problem is the most challenging of the datasets to visualize because it requires representation of both structure and attributes. We wanted to gain firsthand experience in creating interactive visualizations of large structures, since this could be a crucial skill to acquire in this era of information explosion.

The classification datasets compared here represent about 15% of over one million species of known living organisms. These are traditional anatomical phylogenies rather than molecular phylogenies [6]. Classification trees follow a hierarchy of rank. In order of increasing specificity, the seven major ranks are Kingdom, Phylum, Class, Order, Family, Genus, and Species. Each rank may also contain sub-, infra-, and super- handling levels, and we found twenty distinct ranks. By walking the path of nodes from the root (Kingdom Animalia) to the leaf, one gets the complete formal classification for a particular species.

Each node contains up to three variables, of which the first two are always present. The first is its rank. The second is its Latin name, for example, Ctenophora. This uniquely identifies the animal. Each name refers to a comparable pair of animals in each of the two datasets, though the exact children and tree topology may differ. The third, when included, is its common name, such as jellyfish or treefrog. One common name may encompass numerous species. Conversely, a node might be known by different common names in differing representations [2].

Domain Background

One of the central tasks in comparing these trees involves uncovering and analyzing the differences in their hierarchies. Differences occur due to variation in the way animals are ranked, and can be subtle. For example, an infraorder in one tree might be ranked as an order in another. The addition of a branch node changes the hierarchy for all of its descendants. As Cyndy Sims Parr, one of the contributors to the two trees involved in the contest, explains on her web page:

“Classification is a human undertaking. Most systematists agree that classification (how organisms are named and grouped into things like Families and Orders and Classes) ought to reflect what we know about how organisms are related to each other. Yet, what we know is constantly changing, hopefully for the better. And like all human undertakings, there are controversies over what exactly we do know. There is no single ‘correct’ classification, just classifications that are currently accepted by most systematists.” [9]

Previous Work

Our framework is similar to the "Pad" [10] allegory and its extension, Pad++ [1]. In this system, the information space is considered as an infinite 2D plane, which can be stretched by orders of magnitude at any point to investigate details. Pad++ has been mostly explored as a highly interactive, zoomable alternative to traditional windows and icons interfaces and in applications such as navigable web interfaces.

Our visualization exploits zooming techniques employed in GVis, a tool for visualizing genome data [5].

Bottom spacer
Link to GVIS Tree of Life link Link to InfoVis 2003 contest Background Information link Problem Space link Methodology link Zoomology description link Future Directions link >