Discover Research Tools for Textual Study
Corpus Summary is a tool that provides a simple, textual overview of the current corpus. Features of this tool include number of words, number of unique words, longest documents, highest vocabulary density, most frequent words, notable peaks in frequency, and distinctive words.
The Voyant Corpus Summary Tool is a good starting point for the exploration of a text corpus. The simple web based interface quickly processes even large corpora to return the total number of documents, words, and unique words across a corpus, indicating which texts are longest, and most lexically dense. You’ll probably want to apply a stop word list to remove noisy function words (e.g. and, the, is), and let the more lexically meaningful terms shine through. Tapor offers English and French stop word lists, as well as lists for several other languages.
Unlike the Corpus Grid tool, the Summary tool also provides information about the most frequently used words corpus-wide, and within each specific text. You can click on any of these words to see that term within the full set of Voyant tools. Word of warning – if you’ve applied stop words in the Corpus Grid tool, you’ll have to reapply them in the full Voyant interface, as this setting does not carry over. As with all of the Voyant tools, web pages or locally stored texts can be loaded via the tool’s home page. You don’t need to be a text analysis expert to operate this user-friendly tool.
TAPoR v.2.5 | Copyright © 2014 TAPoR Team, University of Alberta.