TAPoR 2.0

Discover Research Tools for Textual Study

  • Browse Tools by Type or Tag
  • Search and Use Tools
  • Read and Create Tool Reviews
  • Contribute and Advertise Tools

Extract Text - HTML (TAPoRware)

This tool extracts text found within specific tags in an HTML document. It is part of the TAPoRware toolset; an XML version is also available.
Extract Text - HTML (TAPoRware)

Extract Text - XML (TAPoRware)

This tool extracts text found within specific tags in an XML document. It is part of the TAPoRware toolset; an HTML version is also available.
Extract Text - XML (TAPoRware)

Get TEI Meta Data - Beta (TAPoRware)

This tool extracts metadata from TEI-compatible XML documents and displays it in name/value format. It is only available for XML.
Get TEI Meta Data - Beta (TAPoRware)

Extract Text From HTML - Beta (TAPoRware)

This tool extracts texts from user-specified HTML tags, elements and attributes. There is no XML counterpart at present.
Extract Text From HTML - Beta (TAPoRware)

Web Page Cleaner - Beta (TAPoRware)

This tool removes all HTML formatting from a web page or an uploaded HTML file, leaving the text for further processing. It is particularly good for preparing text-intensive web pages for analysis as plain text.
Web Page Cleaner - Beta (TAPoRware)

MorphAdorner

MorphAdorner is a Java command-line program for the adornment of words in a text. At present, available adornments include standard spellings, parts of speech and lemmata, in addition to tokanization, the recognition of sentence boundaries and extracting ...
MorphAdorner

Stanford HCI Group: Gigapixel

Gigapixel is a free tool to facilitate experiments in collaborate workspaces, enabling printed visualizations to be augmented with projectors and mobile devices. This tool has been succeeded by PaperToolKit, and continues to be available as a reference ...
Stanford HCI Group: Gigapixel

FromThePage

FromThePage is a free software for manuscript transcription, allowing volunteers to transcribe document pages online. Transcriptions can then be marked up and annotated in a wiki-like enviroment, with the resultant text displayed on the public web. ...
FromThePage

VARD 2

VARD 2 is a free, creative commons tool for preprocessing historical corpora. Built in Java, it enables researchers to easily match up historic variant spellings with modern conventions. Though optimized for Early Modern English, other languages can ...
VARD 2

URICA! II

URICA! IIĀ  (User Response Interactive Collation Assistant) was an interactive collation program for the Macintosh. It semi-automated text collation, and assisted text comparison by 'tagging' variants or automatically reconciling small differences. ...
URICA! II

R

R is an open source programing language designed for statistical analysis and parrallel computing. R began its life as a research project at the University of Aukland, but has since expanded to become a collaborativly run open source project run by ...
R
Sort
User supplied tags
2000s American English (language) 2010s Legacy Java Natural language processing 1990s Metadata French (language) 1980s Canadian Comparator German Word cloud Multilingual Social media French Collocation English Summarizer 1970s Wordpress Collation European Transformer Timeline Transcription Content analysis German (language) Statistical Lexography Publishing Multinational 1960s Word classification Sentiment analysis Distribution Data mining Computational linguistics Lemmatization Qualitative analysis Arabic Co-occurence Disambiguation Poetry Bibliographic management Danish Classification British Network analysis Annotation Command line Collaborative Word list Quantitative analysis Collaboration Visualization Dictionary generation Concordance Chinese Stemmer Word pair analysis Morphological analysis Frequency Translation Voyant Tokenizer Svg Finnish (language) Versioning Sentence generation Analytics Rich-prospect browser Belgian Composition Writing analysis Multimedia Finnish Network management Curation Indexing Mixed methods Semantic web Welsh Tokenization Argentinian Optical character recognition Australian Programming language Faceted browser Environment Framework Development library Russian (language) Browser extension Video analysis Mapping Dutch (language) Early modern english Compiler Audio analysis Portuguese (language) Spelling variation Stylistic analysis Word clusters Tokenizing Toolkit Spanish (language) Topic modelling Email analysis Italian (language) Corpus linguistics Norwegian Readability Transformation Galacian Web interface Estonian Multimedia management Relative frequency Hypergraph Irish Estonian (language) Ngram Hypercard Principal components analysis Word frequency Linguistics Polish (language) Scottish Text mining Document management Czech (language) Web mining