This tool removes all HTML formatting from a web page or an uploaded HTML file, leaving the text for further processing. It is particularly good for preparing text-intensive web pages for analysis as plain text.
| Documentation | Attributes | User Supplied Tags | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Documentation: http://tada.mcmaster.ca/Main/TAPoRwareWebPageCleaner
Author(s): Geoffrey Rockwell et. al.
|
|
2000s, English (language) |
Web Page Cleaner (Beta) is a free, simple to use web-based tool designed to run in a browser window. It processes an HTML document at a user-specified web address or from the user's files and removes all HTML tagging.
This tool is basic, offering only two options: users may either strip tags from their document, or convert it to plain text.
The tool has a few problems. It replaces some punctuation, other non-alphabetical and accented alphabetical characters with Unicode equivalents. When the strip tag option is selected, the tool also runs previously-tagged text together, which can result in issues if no space was included before or after a tagged chunk.
Despite these limitation, Web Page Cleaner (Beta) is an effectively way to quickly convert an HTML document to untagged text or plain text.

February 28, 2012 12:50 AM