This tool splits an XML document at specified points into 'tokens' - words, lines, sentences, paragraphs or characters. The user can specify characters, patterns, or tags upon which to separate tokens, and choose to have the results listed separator removed, appearing before the split, or appearing after the split. Versions are also available for HTML and plain text within the TAPoR toolset.
| Documentation | Attributes | User Supplied Tags | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Documentation: http://tada.mcmaster.ca/Main/TAPoRwareXMLTokenize
Author(s): Geoffrey Rockwell et. al.
|
|
2000s, English (language) |
Comments
Contribute
Public contributions are currently turned off due to spam issues. Please contact us us to get an account.
