Jeromy Anglim's Blog: Psychology and Statistics


Monday, November 29, 2010

Sweave Tutorial 2: Batch Individual Personality Reports using R, Sweave, and LaTeX

This post documents an example of using Sweave to generate individualised personality reports based on responses to a personality test. Each report provides information on both the responses of the general sample and responses of the specific respondent. All source code is provided, and selected aspects are discussed, including makefiles use of \Sexpr, figures, and LaTeX tables using Sweave.

Overview

All source code is available on GitHub:

Three examples of compiled PDF reports can be viewed as follows: ID1 ID2 and ID4.

The resulting report is a simple proof of concept example.

Discussion of Source Code

makefile

outputDir = .output
backupDir = .backup

test:
    -mkdir $(outputDir)
    Rscript --verbose run1test.R  

test5:
    -mkdir $(outputDir)
    Rscript --verbose run5test.R  


runall:
    -mkdir $(outputDir)
    Rscript --verbose runAll.R  

clean:
    -rm $(outputDir)/*

backup:
    -mkdir $(backupDir)
    cp $(outputDir)/Report_Template_ID*[0123456789].pdf --target-directory=$(backupDir)
  • outputDir stores the name of the folder used to store derived files (e.g., tex files, images, and compiled document PDFs)
  • backupDir stores the name of the folder where document PDFs are to be stored
  • test: is the default goal. Running make in the project directory will run run1test.R which will build one report.
  • the --verbose option shows the progress of R when run as a script. It's useful for seeing progress and debugging.
  • test5: compiles five reports.
  • runall: compiles all reports. Further information on each of the Run... .R files can be obtained by inspecting these files. In general they source Run.R and specify which ids to run reports on.
  • clean: removes all the files from the output directory (i.e., all the derived files)
  • backup: copies reports to the backup folder; i.e., it separates the finished documents from all the other derived files.
  • To run test5, runnall etc., type make test5 or make runnall etc.

main.R

main.R loads external functions and packages, imports data, imports metadata and processes the data.

# Import Data
ipip <-read.delim("data/ipip.tsv")
ipipmeta <-read.delim("meta/ipipmeta.tsv")
ipipscales <- read.delim("meta/ipipscales.tsv")
  • When importing the data, I have adopted the useful convention (which I observed from John Myles White's ProjectTemplate Package) of naming objects and data file names the same. The file extension also clearly indicates the file format (i.e., tab-separated-values).
  • I often have separate data and meta folders. Importing metadata often makes for more manageable code than when incorporating metadata by hard coding it into the R script.

Test scores are calculated using the function score.items.

ipipstats <- psych::score.items(ipipmeta[,ipipscales$scale], 
        ipip[,ipipmeta[,"variable"]],
        min = 1, max = 5)
  • The psych package has a number of useful functions for psychological research. score.items is particularly good. It enables the creation of means and totals for multiple scales. It handles item reversal. It also returns information related to the reliability of the scales.

Run.R

source("main.R", echo = TRUE)
id <- NULL
exportReport <- function(x) {
    id <<- x
    fileStem <- "Report_Template"
    file.copy("Report_Template.Rnw",
            paste(".output/", fileStem, "_ID", id, ".Rnw", sep =""),
            overwrite = TRUE)
    file.copy("Sweave.sty", ".output/Sweave.sty", overwrite = TRUE)
    setwd(".output")
    Sweave(paste(fileStem, "_ID", id, ".Rnw", sep =""))
    tools::texi2dvi(paste(fileStem, "_ID", id, ".tex", sep =""), pdf = TRUE)
    setwd("..")
}
  • The above code provides the function to run Sweave on each individualised report
  • the code is a little bit messy, contains a few hacks, and is not especially robust.
  • the exportReport function takes an id value as an argument x. Note the use of the alternative assignment operator. (See ?assignOps)
  • The code is designed to keep derived files away from source files by copying files into the .output folder and even changing the working directory to that directory.
  • The code creates an individualised copy of the Rnw file; Runs Sweave on the report to produce a tex file, and then runs texi2dvi with pdf=TRUE to produce the final pdf.

Report_Template.Rnw

  • The Rnw file contains interspersed chunks of LaTeX and R code.
  • Because the Rnw file is called from within R, all the R objects and data processing code does not need to be called at the start of the Rnw file. This approach is one way of reducing the time it takes to run a set of Sweave reports all based on a common data source.
  • The \Sexpr{} command is used to incorporate in-line text. (... sample of \Sexpr{nrow(ipip)} students ...). In the example above, it prints the actual number of cases into the ipip data.frame (i.e., the sample size).

Incorporating a figure using Sweave

\begin{figure}
<<plot_scale_distributions, fig=true>>=
plotScale <- function(ipipscale) {
    ggplot(ipip, aes_string(x=ipipscale["scale"])) + 
        scale_x_continuous(limits=c(1, 5),
            name = ipipscale["name"]) +
        scale_y_continuous(name = "", labels ="",   breaks = 0) +
        geom_density(fill="green", alpha = .5) +
        geom_vline(xintercept = ipip[ipip$id %in% id, ipipscale["scale"]],
                size=1) 
}


scaleplots <-   apply(ipipscales, 1, function(X) plotScale(X))

arrange(scaleplots[[1]], 
        scaleplots[[2]],
        scaleplots[[3]],
        scaleplots[[4]],
        scaleplots[[5]],
        ncol=3)
@
\caption{Figures show distributions of scores of each personality factor
in the norm sample.
Higher scores mean greater levels of the factor. 
The black vertical line indicates your score.}
\end{figure}
  • The first R code chunk produces a figure using ggplot2.
  • The code above takes a while to run (perhaps around 10 seconds on my machine). But the resulting plot is more attractive than what I could easily get with base graphics.
  • <<plot_scale_distributions, fig=true>>= indicates the start of an R code chunk. fig=true lets Sweave know that it has to produce code to include a figure.
  • The R code chunk is substituted with \includegraphics{Report_Template_ID10-plot_scale_distributions} in the tex file and the pdf and eps figures are created. Thus, if you want a float with captions and labels, you have to add them around the R code chunk.
  • the plotScale function is used to generate a ggplot2 figure of the distribution of scores on each personality scale along with a marking of the respondent's score on each scale.
  • The arrange function is used to layout multiple ggplot2 figures on a single plot. The source code is in the lib/vp.layout.R and was taken from a [post by Stephen Turner( http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html)

Preparing a formatted table in R for LaTeX

<<prepare_table>>=
ipiptable <- list()
ipiptable$colnames <- c("item", "scaleF", "text", "meanF",
        "sdF", "is1F", "is2F", "is3F", "is4F", "is5F")
ipiptable$cells <- ipipsummary[,ipiptable$colnames ]
ipiptable$cells$item <- paste(ipiptable$cells$item, ".", sep="")

# assign actual respones to table
ipiptable$cells[,c("is1F", "is2F", "is3F", "is4F", "is5F")] <-
        sapply(1:5, function(X)
        ifelse(as.numeric(ipip[ipip$id %in% id, ipipmeta$variable]) == X, 
                paste("*", ipiptable$cells[[paste("is", X, "F", sep ="")]], sep =""),
                ipiptable$cells[[paste("is", X, "F", sep ="")]]))


ipiptable$cellsF <- as.matrix(ipiptable$cells) 

ipiptable$cellsF <- ipiptable$cellsF[order(ipiptable$cellsF[, "scaleF"]), ]

ipiptable$row1 <- c("", "Scale", "Item Text", 
        "M", "SD", "VI\\%", "MI\\%", "N\\%", "MA\\%", "VA\\%")

ipiptable$table <- rbind(ipiptable$row1, ipiptable$cellsF)
ipiptable$tex <- paste(
        apply(ipiptable$table, 1, function(X) paste(X, collapse = " & ")), 
        "\\\\")
for(i in c(41, 31, 21, 11, 1)) {
    ipiptable$tex <- append(ipiptable$tex, "\\midrule", after=i)
}
ipiptable$tex1 <- ipiptable$tex[c(1:34)]
ipiptable$tex2 <- ipiptable$tex[c(1,35:56)]
  • I often find it useful to split R code chunks for table preparation and table presentation. In general this allows any text that appears before the table to include \Sexpr{} commands incorporating figures from the analyses which generate the table. In the present case, it was useful because the table was split over two pages.
  • The code shows some of the general logic I use for customised table creation. In hindsight I could probably refactor it into a function so that I don't have to always type ipiptable which would make things a little more concise
  • The general process of table creation involves: (a) extracting information on cells with cells often grouped into types which will receive common formatting treatment (b) formatting cells (e.g., rounding, decimals, and so on) (c) assembling the cells typically using a combination of the functions rbind and cbind (d) Inserting tex column and end of row separators with something like: paste(apply(x, 1, function(X) paste(X, collapse = " & ")), "\\\\") where x is the matrix of table cells.

Don't Repeat Yourself Principle using R and Sexpr{}

ipiptable$caption <-
        "Response options were 
1 = (V)ery (I)naccurate, 
2 = (M)oderately (I)naccurate,
3 = (N)either Inaccurate nor Accurate, 
4 = (M)oderately (A)ccurate
5 = (V)ery (A)ccurate.
Thus, VI\\\\% indicates the percentage of the norm sample 
        giving a response indicating that the item is a Very Inaccurate
description of themselves.
Your response is indicated with an asterisk (*)."
  • This text was used in both tables. Thus, this text can then be called using \Sexpr{ipiptable[["caption"]]}. This follows the DRY principle (Don't Repeat Yourself). Thus, if the caption needs to be modified, it only needs to be modified in one place.

Incorporating the tex formatted table using R Code chunks

\begin{table}
\begin{adjustwidth}{-1cm}{-1cm}
\caption{Table of results for (A)greeableness, (C)onscientiousness
and (E)motional (S)tability items.
\Sexpr{ipiptable[["caption"]]}} 
\begin{center}
\begin{tabular}{rrp{4cm}rrrrrrr}
\toprule
<<table_part1, results=tex>>=
cat(ipiptable$tex1, sep="\n") 
@
\bottomrule
\end{tabular}
\end{center}
\end{adjustwidth}
\end{table}
  • The tables are then incorporated into the tex file.
  • The R code only generated some of the required tex for the table. Thus all the other desired elements such as the table environment and captions are written either side of the R code chunk.
  • the R code chunk uses the option results=tex in order to enter the output from the cat function verbatim into the resulting tex file.
  • cat(ipiptable$tex1, sep="\n") includes a vector of tex. With the newline separator simply making the resulting tex more readable.

Additional Resources