TAPoR 2.0

Discover Research Tools for Textual Study

  • Browse Tools by Type or Tag
  • Search and Use Tools
  • Read and Create Tool Reviews
  • Contribute and Advertise Tools

Stanford Vis Group: Data Wrangler

Data Wrangler is free web-based tool for interactive data cleaning and transformation. It takes raw data and transforms it into data tables for further analysis and cleaning. The results can be exported in a variety of data table formats, including ...
Stanford Vis Group: Data Wrangler

Stanford Vis Group: Protovis

Protovis is a free, open source tool for composing custom views of data from simple elements such as bars or dots. Marks are defined through dynamic properties to allow inheritance, scales and layouts for simplified visualization construction. This ...
Stanford Vis Group: Protovis

TagCrowd

TagCrowd is a tool for generating a frequency-based word cloud from a source text, with a free browser version available through the TagCrowd website. A commercial version may also be purchased, subject to a creative commons license.
TagCrowd

TextArc

TextArc is a free visualization tool that represents an entire text on a single page. It has elements of an index, concordance and summary all in one place, encouraging the viewer to use its juxtapositions to uncover meaning. The web-based applet is ...
TextArc

University of Maryland HCI Group: FeatureLens

FeatureLens is a free tool for visualizing and exploring patterns in text collections. This tool integrates the results of text-mining algorithms, and can assist in finding frequent words or ngrams, enabling the discovery of fuzzy repetition patterns. ...
University of Maryland HCI Group: FeatureLens

Versioning Machine

The Versioning Machine, now in version 4.0, is a framework and an interface for displaying multiple versions of text, and encodes the text according to TEI guidelines. It incorporates features found in both critical editions and electronic publication, ...
Versioning Machine

WordSmith

WordSmith Tools is a commercial integrated suite of programs designed to analyze word behaviour in a text. It can be used to generate a list of all words or word clusters, concord, find keywords and more. This tool is recommended for publishers, language ...
WordSmith

Apache Open NLP

Apache OpenNLP is a free, open source toolkit for processing natural language text, based on machine learning. It includes common natural language processing functions such as tokenization, sentence segmentation, part-of-speech tagging, named entity ...
Apache Open NLP

Apache UIMA

Apache UIMA (Unstructured Information Management application) is a software system for analyzing large amounts of unstructured data, such as a plain text document, and identifying entities, such as persons, places, organizations, or relations between ...
Apache UIMA

Cytoscape

Cytoscape is an open source software platform for visualizing data networks and pathways. Though designed for bioinformatic systems, it has been generalized to complex network analysis and has applications extending to the semantic web. Its core distribution ...
Cytoscape

DocuBurst

DocuBurst is a free, open source visualization tool for displaying the contents of a document, utilizing human-created structure in lexical databases. It produces a radial layout which may be zoomed, filtered, or have a details-on-demand technique applied, ...
DocuBurst

EURAC: Comparison Arcs

Comparison Arcs is a free proof of concept tool demonstrating a method for comparing the linguistic properties of multiple documents in graphical displays. Both texts may be searched in parallel for words, lemmas and parts of speech. This project is ...
EURAC: Comparison Arcs

EURAC: Corpus Clouds

Corpus Clouds is a free Java demo of a novel interface to a corpus query engine. It aims to aid the user in exploratory search by providing visual information about frequency and distribution of search results in combination with the standard KWIC elements. ...
EURAC: Corpus Clouds

EURAC: Double Tree

Double Tree is a free, open source Java application providing a visualization component for supporting exploratory corpus analysis. It focuses particularly on analyzing concordances, and can also represent a KWIC for a single word by collapsing the ...
EURAC: Double Tree

EURAC: End to End

End to End is a visualization application for exploratory corpus analysis focused on collocations. This tool starts with two words and constructs a visual network of all collocations of those words within the corpus, while its interface enables interactive ...
EURAC: End to End
Sort
User supplied tags
2000s American English (language) 2010s Legacy Java Natural language processing 1990s Metadata French (language) 1980s Canadian Comparator German Word cloud Multilingual Social media French Collocation English Summarizer 1970s Wordpress Collation European Transformer Timeline Transcription Content analysis German (language) Statistical Publishing Multinational 1960s Word classification Sentiment analysis Distribution Data mining Lexography Computational linguistics Lemmatization Qualitative analysis Arabic Co-occurence Disambiguation Poetry Network analysis Annotation Command line Collaborative Word list Quantitative analysis Collaboration Visualization Dictionary generation Concordance Chinese Stemmer Word pair analysis Morphological analysis Frequency Translation Voyant Tokenizer Svg Finnish (language) Versioning Bibliographic management Danish Classification British Network management Curation Indexing Mixed methods Semantic web Welsh Tokenization Argentinian Optical character recognition Australian Programming language Faceted browser Environment Framework Development library Russian (language) Browser extension Video analysis Mapping Dutch (language) Early modern english Compiler Audio analysis Portuguese (language) Spelling variation Stylistic analysis Word clusters Tokenizing Toolkit Spanish (language) Topic modelling Email analysis Italian (language) Corpus linguistics Norwegian Readability Transformation Galacian Web interface Estonian Multimedia management Relative frequency Hypergraph Irish Estonian (language) Ngram Hypercard Principal components analysis Word frequency Linguistics Polish (language) Scottish Text mining Document management Czech (language) Web mining Sentence generation Analytics Rich-prospect browser Belgian Composition Writing analysis Multimedia Finnish