Last Updated: Nov 27, 2011

This tool lists words in an HTML document, either uploaded by the user or from a web address. List Words works with relatively small texts of under a megabyte in size. It is part of the TAPoRware collection of tools; there are XML and plain text versions available as well.

DocumentationAttributesUser Supplied Tags
Created: May 12, 2011
Last Updated: Nov 27, 2011
Ease of use Easy
Tool family Taporware
Type of analysis Statistical
Type of license Free
Warning Still in development
Web usable Run in browser
, , ,
February 08, 2012 12:17 AM

List Words (HTML) is a free, web-based tool designed to run in a browser window. It is a simple to use tool, designed to count and generate lists of all words in a document, either hosted at a web address or uploaded from the user's files.

Users may apply the provided Glasgow stop list, upload their own, or work with the full list. This tool also generates a basic statistical analysis, including total number of words, unique words, appearances of particular words, only words matching a specified pattern, or only words within specified tags.

Other features include preset sort options, an inflectional stemmer and output formats including XML Tree and tab delimited. Most notably, the HTML output option includes a small distribution graph for the most frequent words in the document. While the tool can handle novel-sized texts, applying the inflectional stemmer will slow down the tool to a degree proportional to the size of the text.

The tool's tab-delimited output has problems with some characters - for example, cæsar becomes "c¾sar," and punctuation such as quotation marks is often replaced with its unicode equivalent. More generally, words joined by hyphens (such as "wine-dark-sea") are counted as one word, and opening quotation marks are appended to the word they are adjacent to. Users are advised to watch for these instances and adjust the results accordingly.

Despite these limitations, List Words (HTML) is an effective way to get a quick breakdown of the text. Versions are also available for XML and plain text documents.

