TAPoR 2.0

Discover Research Tools for Textual Study

  • Browse Tools by Type or Tag
  • Search and Use Tools
  • Read and Create Tool Reviews
  • Contribute and Advertise Tools

Last Updated: Dec 10, 2014

R is an open source programing language designed for statistical analysis and parallel computing. R began its life as a research project at the University of Aukland, but has since expanded to become a collaborativly run open source project run by the GNU. R has libraries for data import, regular expression data splitting, and visualization of data. R can be run as either a script or as a working environment.

Have you tried this tool? Please contribute your rating and comment on your experience.
DocumentationAttributesUser Supplied Tags
Author(s): GNU
Created: Feb 11, 2013
Last Updated: Dec 10, 2014
Background processing Not applicable
Ease of use Difficult
Historic tool (developed before 2005) Development sustained to present
Type of analysis Programming language, Statistical, Text cleaning, Visualization
Type of license Free, Open source
Usage Widely used
Web usable Software you download and install
June 07, 2013 08:30 PM

More information on R and an introduction to the statistics it employs is available in:

Bayaan, R.H. Analyzing Linguistic Data: A Practical Introduction to Statistics. Cambridge: Cambridge University Press, 2008.

June 07, 2013 08:30 PM

Overview and Setup

R is a free open source programing language and statistical environment maintained by the GNU. R contains powerful libraries for parallel computing and is especially adept at computing on large data sets. The R website maintains and distributes the necessary files to install on all major operating systems. The R executable itself operates inside a command line environment; however, an optional and separate program Rstudio can provide a graphical development environment.


R was not originally designed for textual analysis, and it abilities go beyond the direct needs of the average text analysis researcher. However, R’s vector data type, its ability to work on several data sources simultaneously, and its built-in visualization tools make graphing data exceptionally easy. With some coaxing R provides an ideal environment for large-scale data processing and analysis. R provides all tools necessary for splitting and combining texts, yet prior knowledge of regular expressions is mandatory in order to get the most out of these options. The R language vector data structure scales easily between small and large data sets. R also includes one command graph generation for all common graph and data types.


R is a programing language, yet it does several things differently then most conventional scripting languages. R does not rely heavily on control structures, which makes it easier for someone not familiar with recursive programing to learn. Anyone with a solid understanding of flows and tokenization should be able to pick up the programing style easily. However, R is a tool far bigger then text analysis; its features may cause some confusion for the self-learning especially those not immediately familiar with statistical concepts. R is an excellent tool for large and long-term research projects as well as toying around with smaller data sets. An excellent introduction to R targeted at humanist scholars can be found in the book ‘Quantitative Corpus Linguistics with R,’ by Stefan Th. Gries. (Link below) While more experience programmers might want to look at the official R tutorial on the R website.



Please login to contribute to this tool.
Please login to rate this tool.
Recommended Tools
People also used