Data Wrangler is free web-based tool for interactive data cleaning and transformation. It takes raw data and transforms it into data tables for further analysis and cleaning. The results can be exported in a variety of data table formats, including ...
Stanford Vis Group: Data Wrangler
Stanford Vis Group: Protovis
Protovis is a free, open source tool for composing custom views of data from simple elements such as bars or dots. Marks are defined through dynamic properties to allow inheritance, scales and layouts for simplified visualization construction. This ...
Stanford Vis Group: Protovis
TagCrowd
TagCrowd is a tool for generating a frequency-based word cloud from a source text, with a free browser version available through the TagCrowd website. A commercial version may also be purchased, subject to a creative commons license.
TagCrowd
TextArc
TextArc is a free visualization tool that represents an entire text on a single page. It has elements of an index, concordance and summary all in one place, encouraging the viewer to use its juxtapositions to uncover meaning. The web-based applet is ...
TextArc
University of Maryland HCI Group: FeatureLens
FeatureLens is a free tool for visualizing and exploring patterns in text collections. This tool integrates the results of text-mining algorithms, and can assist in finding frequent words or ngrams, enabling the discovery of fuzzy repetition patterns. ...
University of Maryland HCI Group: FeatureLens
Versioning Machine
The Versioning Machine, now in version 4.0, is a framework and an interface for displaying multiple versions of text, and encodes the text according to TEI guidelines. It incorporates features found in both critical editions and electronic publication, ...
Versioning Machine
WordSmith
WordSmith Tools is a commercial integrated suite of programs designed to analyze word behaviour in a text. It can be used to generate a list of all words or word clusters, concord, find keywords and more. This tool is recommended for publishers, language ...
WordSmith
Apache Open NLP
Apache OpenNLP is a free, open source toolkit for processing natural language text, based on machine learning. It includes common natural language processing functions such as tokenization, sentence segmentation, part-of-speech tagging, named entity ...
Apache Open NLP
Apache UIMA
Apache UIMA (Unstructured Information Management application) is a software system for analyzing large amounts of unstructured data, such as a plain text document, and identifying entities, such as persons, places, organizations, or relations between ...
Apache UIMA
Cytoscape
Cytoscape is an open source software platform for visualizing data networks and pathways. Though designed for bioinformatic systems, it has been generalized to complex network analysis and has applications extending to the semantic web. Its core distribution ...
Cytoscape
DocuBurst
DocuBurst is a free, open source visualization tool for displaying the contents of a document, utilizing human-created structure in lexical databases. It produces a radial layout which may be zoomed, filtered, or have a details-on-demand technique applied, ...
DocuBurst
EURAC: Comparison Arcs
Comparison Arcs is a free proof of concept tool demonstrating a method for comparing the linguistic properties of multiple documents in graphical displays. Both texts may be searched in parallel for words, lemmas and parts of speech. This project is ...
EURAC: Comparison Arcs
EURAC: Corpus Clouds
Corpus Clouds is a free Java demo of a novel interface to a corpus query engine. It aims to aid the user in exploratory search by providing visual information about frequency and distribution of search results in combination with the standard KWIC elements. ...
EURAC: Corpus Clouds
EURAC: Double Tree
Double Tree is a free, open source Java application providing a visualization component for supporting exploratory corpus analysis. It focuses particularly on analyzing concordances, and can also represent a KWIC for a single word by collapsing the ...
EURAC: Double Tree
EURAC: End to End
End to End is a visualization application for exploratory corpus analysis focused on collocations. This tool starts with two words and constructs a visual network of all collocations of those words within the corpus, while its interface enables interactive ...