TAPoR 2.0

Discover Research Tools for Textual Study

  • Browse Tools by Type or Tag
  • Search and Use Tools
  • Read and Create Tool Reviews
  • Contribute and Advertise Tools

Extract Text - HTML (TAPoRware)

This tool extracts text found within specific tags in an HTML document. It is part of the TAPoRware toolset; an XML version is also available.
Extract Text - HTML (TAPoRware)

Extract Text - XML (TAPoRware)

This tool extracts text found within specific tags in an XML document. It is part of the TAPoRware toolset; an HTML version is also available.
Extract Text - XML (TAPoRware)

Get TEI Meta Data - Beta (TAPoRware)

This tool extracts metadata from TEI-compatible XML documents and displays it in name/value format. It is only available for XML.
Get TEI Meta Data - Beta (TAPoRware)

Extract Text From HTML - Beta (TAPoRware)

This tool extracts texts from user-specified HTML tags, elements and attributes. There is no XML counterpart at present.
Extract Text From HTML - Beta (TAPoRware)

Web Page Cleaner - Beta (TAPoRware)

This tool removes all HTML formatting from a web page or an uploaded HTML file, leaving the text for further processing. It is particularly good for preparing text-intensive web pages for analysis as plain text.
Web Page Cleaner - Beta (TAPoRware)

MorphAdorner

MorphAdorner is a Java command-line program for the adornment of words in a text. At present, available adornments include standard spellings, parts of speech and lemmata, in addition to tokanization, the recognition of sentence boundaries and extracting ...
MorphAdorner

Stanford HCI Group: Gigapixel

Gigapixel is a free tool to facilitate experiments in collaborate workspaces, enabling printed visualizations to be augmented with projectors and mobile devices. This tool has been succeeded by PaperToolKit, and continues to be available as a reference ...
Stanford HCI Group: Gigapixel

FromThePage

FromThePage is a free software for manuscript transcription, allowing volunteers to transcribe document pages online. Transcriptions can then be marked up and annotated in a wiki-like enviroment, with the resultant text displayed on the public web. ...
FromThePage

VARD 2

VARD 2 is a free, creative commons tool for preprocessing historical corpora. Built in Java, it enables researchers to easily match up historic variant spellings with modern conventions. Though optimized for Early Modern English, other languages can ...
VARD 2

URICA! II

URICA! IIĀ  (User Response Interactive Collation Assistant) was an interactive collation program for the Macintosh. It semi-automated text collation, and assisted text comparison by 'tagging' variants or automatically reconciling small differences. ...
URICA! II

R

R is an open source programing language designed for statistical analysis and parrallel computing. R began it's life as a research project at the University of Aukland, but has since expanded to become a collaborativly run open source project run by ...
R
Sort
User supplied tags