Last Updated: Oct 30, 2013

Corpus Summary is a tool that provides a simple, textual overview of the current corpus. Features of this tool include number of words, number of unique words, longest documents, highest vocabulary density, most frequent words, notable peaks in frequency, and distinctive words.

DocumentationAttributesUser Supplied Tags
Created: Jun 02, 2011
Last Updated: Oct 30, 2013
Background processing Not applicable
Ease of use Very easy
Tool family Voyant
Type of analysis Statistical
Type of license Free
Usage New
Web usable Run in browser
February 08, 2012 07:26 AM

The Voyant Corpus Summary Tool is a good starting point for the exploration of a text corpus. The simple web based interface quickly processes even large corpora to return the total number of documents, words, and unique words across a corpus, indicating which texts are longest, and most lexically dense. You’ll probably want to apply a stop word list to remove noisy function words (e.g. and, the, is), and let the more lexically meaningful terms shine through. Tapor offers English and French stop word lists, as well as lists for several other languages.

Unlike the Corpus Grid tool, the Summary tool also provides information about the most frequently used words corpus-wide, and within each specific text. You can click on any of these words to see that term within the full set of Voyant tools. Word of warning – if you’ve applied stop words in the Corpus Grid tool, you’ll have to reapply them in the full Voyant interface, as this setting does not carry over. As with all of the Voyant tools, web pages or locally stored texts can be loaded via the tool’s home page. You don’t need to be a text analysis expert to operate this user-friendly tool.  

