Discover Research Tools for Textual Study
R is an open source programing language designed for statistical analysis and parallel computing. R began its life as a research project at the University of Aukland, but has since expanded to become a collaborativly run open source project run by the GNU. R has libraries for data import, regular expression data splitting, and visualization of data. R can be run as either a script or as a working environment.
More information on R and an introduction to the statistics it employs is available in:
Bayaan, R.H. Analyzing Linguistic Data: A Practical Introduction to Statistics. Cambridge: Cambridge University Press, 2008.
Overview and Setup
R is a free open source programing language and statistical environment maintained by the GNU. R contains powerful libraries for parallel computing and is especially adept at computing on large data sets. The R website maintains and distributes the necessary files to install on all major operating systems. The R executable itself operates inside a command line environment; however, an optional and separate program Rstudio can provide a graphical development environment.
R was not originally designed for textual analysis, and it abilities go beyond the direct needs of the average text analysis researcher. However, R’s vector data type, its ability to work on several data sources simultaneously, and its built-in visualization tools make graphing data exceptionally easy. With some coaxing R provides an ideal environment for large-scale data processing and analysis. R provides all tools necessary for splitting and combining texts, yet prior knowledge of regular expressions is mandatory in order to get the most out of these options. The R language vector data structure scales easily between small and large data sets. R also includes one command graph generation for all common graph and data types.
R is a programing language, yet it does several things differently then most conventional scripting languages. R does not rely heavily on control structures, which makes it easier for someone not familiar with recursive programing to learn. Anyone with a solid understanding of flows and tokenization should be able to pick up the programing style easily. However, R is a tool far bigger then text analysis; its features may cause some confusion for the self-learning especially those not immediately familiar with statistical concepts. R is an excellent tool for large and long-term research projects as well as toying around with smaller data sets. An excellent introduction to R targeted at humanist scholars can be found in the book ‘Quantitative Corpus Linguistics with R,’ by Stefan Th. Gries. (Link below) While more experience programmers might want to look at the official R tutorial on the R website.
TAPoR v.2.5 | Copyright © 2014 TAPoR Team, University of Alberta.