TAPoR 2.0

Discover Research Tools for Textual Study

  • Browse Tools by Type or Tag
  • Search and Use Tools
  • Read and Create Tool Reviews
  • Contribute and Advertise Tools

TAPoR 2.5 is scheduled for decommissioning.
Please visit TAPoR 3

Popular Tools
User Recommended Tools
Random Tools

etcML

etcML (Easy Text Classification with Machine Learning) is a free text analysis tool from Stanford University that uses machine learning to identify positive and negative sentiments in texts. Users can analyze their own dataset, use a dataset provided ...
etcML
etcML

Voyant ScatterPlot

ScatterPlot creates a scatter plot graph of terms, spaced by their variation from one another. Once you arrive to ScatterPlot, insert / upload your content and let the tool perform its analysis. You may hover over these dots and click on them for ...
Voyant ScatterPlot
Voyant ScatterPlot

SUSS (Sunderland University SENSEVAL System)

SUSS (Sunderland University SENSEVAL System) was an algorithm for word sense disambiguation developed for the inaugural SENSEVAL event in 1998.
SUSS (Sunderland University SENSEVAL System)
SUSS (Sunderland University SENSEVAL System)

Voyant Corpus Term Frequencies

Corpus Term Frequencies shows overall word frequencies for the entire corpus as well as information about how word frequencies are spread out over documents within the corpus. Hover over column headers and buttons for more information.
Voyant Corpus Term Frequencies
Voyant Corpus Term Frequencies

Voyant Mandala

Mandala is a visualization tool that imports “textual” files to perform analysis on the frequency and linkage of words. For example, you may import a play and find the linkage and frequency between a word and its speaker.  
Voyant Mandala
Voyant Mandala

Domeo Annotation Toolkit

The Domeo Annotation Toolkit is an extensible web application for creating and sharing ontology-based stand-off annotations on HTML or XML documents. Users can add annotations manually, or via the tool's full or partial automation options. It also includes ...
Domeo Annotation Toolkit
Domeo Annotation Toolkit

TUSTEP

TUSTEP (Tubingen System of Text Processing Tools) is a free, open source, widely-used toolbox for text processing. It is aimed at scholarly audiences, can work with texts in both latin and non-latin scripts, and is primarily designed for humanites applications. ...
TUSTEP
TUSTEP

SimpleTCT

SimpleTCT (Simple Text Comparison Tool) is a free Java-based text comparison tool offered by Open Digital Arts & Humanities Tools (OpenDAHT). It offers a simplified management environment that enables users to display .rtf files, in which they may ...
SimpleTCT
SimpleTCT

Stanford NLP Group: Stanford Word Segmenter

Stanford Word Segmenter is a free, open source Java-based tokenization tool for Chinese and Arabic text that integrates token pre-processing, or segmentation. For Arabic, the tool processes text according to the Penn Arabic Treebank 3 standard. For ...
Stanford NLP Group: Stanford Word Segmenter
Stanford NLP Group: Stanford Word Segmenter

Visual Browser

Visual Browser is a Java application for visualizing RDF data using the Jena framework. The resultant graphs are animated and permit users to expand and hide nodes and switch the view of edges, allowing them to focus on a small part of the network. The ...
Visual Browser
Visual Browser

Voyant Bubbles

Bubbles reads the words in a document (or corpus) and displays the highest frequency words within proportionately large bubbles. Once you arrive to Bubbles, insert / upload your content and let the tool perform its analysis.
Voyant Bubbles
Voyant Bubbles

TABARI (Text Analysis By Augmented Replacement Instructions)

TABARI (Text Analysis By Augmented Replacement Instructions) is a legacy open-source successor to the KEDS program, written in C++ and maintained online to the present in an OS X edition. It runs in the Terminal command prompt, and is designed for analyzing ...
TABARI (Text Analysis By Augmented Replacement Instructions)
TABARI (Text Analysis By Augmented Replacement Instructions)

BookLamp

BookLamp, part of the Book Genome Project, is a tool and a resource for finding books. It offers an alternative to social recommendation engines reliant on author popularity by treating its books as equal regardless of number of copies sold. BookLamp's ...
BookLamp
BookLamp

Voyant Knots

Knots is a visualization tool that helps to understand patterns of word relevance in one or more documents. Each term is represented as a twisted line – when the lines overlap it means a relevance or linkage within the terms.
Voyant Knots
Voyant Knots

CONCORD

CONCORD was a concordance program developed in 1968 to identify and sort collocating words or phrases. Notably, it allowed users to sort words ending in one of several suffixes (such as -er or -est) to be grouped together.
CONCORD
CONCORD

Voyant Knots

Knots is a visualization tool that helps to understand patterns of word relevance in one or more documents. Each term is represented as a twisted line – when the lines overlap it means a relevance or linkage within the terms.
Voyant Knots
Voyant Knots

Mallet

Mallet is a Java based library and command line framework that provides statistical and machine learning tools for use with natural language processing.
Mallet
Mallet

PANVS

PANVS was a system for syntactic analysis of Arabic text developed in the 1990s. It was written in TURBO PROLOG for IBM PS/2 computers, and included lexical and morphological analyzers, a parser, and a module for automatic spelling correction.
PANVS
PANVS

LIWC (Linguistic Inquiry and Word Count)

LIWC (Linguistic Inquiry and Word Count) is a text analysis program available for purchase. It calculates the degree to which various categories of words are used in a text, and can process texts ranging from e-mails to speeches, poems and transcribed ...
LIWC (Linguistic Inquiry and Word Count)
LIWC (Linguistic Inquiry and Word Count)

Gephi

Gephi is a free, open source interactive visualization and data exploration tool. Users can manipulate the display to uncover new facets of the data, enabling intutive exploration.
Gephi
Gephi

DISCAN

DISCAN was a tool for content and discourse analysis originally available for mainframe computers, and adapted for PCs with DOS 3.0 in 1991. It accepted ASCII files, could create KWIC and co-occurrence listings, and had modules for both creating content ...
DISCAN
DISCAN
View tools by tag:
1960s 1970s 1980s 1990s 2000s 2010s American Annotation Canadian Comparator English English (language) French (language) German Historic Java Metadata Multilingual Natural language processing Social media
All Tags: