R packages by Hadley Wickham


Documentation

Documentation is one of the most important aspects of a good package. Without it, users won’t know how to use your package, and are unlikely to do so. Documentation is also useful for future-you (so you remember what your functions were supposed to do), and for developers extending your package.

There are multiple forms of documentation. In this chapter, you’ll learn about function documentation, as accessed by ? or help(). Function documentation is a type of reference documentation. It works like a dictionary: a dictionary is helpful if you want to know what a word means, but doesn’t help find the right word for a new situation. Similarly, function documentation is helpful if you already know the name of the function, but doesn’t help you find the function you need to solve a given problem. That’s the job of vignettes, which you’ll learn about in the next chapter.

R provides a standard way of documenting packages: you write .Rd files in the man/ directory. These files use a custom syntax that’s loosely based on latex. Instead of writing these files directly, we’re going to use roxygen2 which turns specially formatted comments into .Rd files. The goal of roxygen2 is to make documenting your code as easy as possible, so it has a number of advantages over writing .Rd files by hand:

  • Code and documentation are intermingled so when that you modify your code, you’re reminded to also update your documentation.

  • Roxygen2 dynamically inspects the objects that it documents, so you can skip some boilerplate that you’d need to write by hand.

  • It abstracts over the differences in documenting different types of objects, so you need to learn fewer details.

As well as generating .Rd files, roxygen2 can also manage your NAMESPACE and the Collate field in DESCRIPTION. This chapter discusses .Rd files and the collate field. NAMESPACE describes how you can use roxygen2 to manage your NAMESPACE, and why you should care.

The documentation workflow

I’ll first show you a rough outline of the complete documentation workflow, and then we’ll dive into the individual steps. The documentation workflow is made up of three basic steps:

  1. Add roxygen comments to your .R files.

  2. Run devtools::document() (or press Cmd + Shift + D in RStudio) to convert roxygen comments to .Rd files.

  3. Preview documention with ?.

  4. Rinse and repeat until the documentation looks the way you want.

The process starts when you add roxygen comments to your source file: roxygen comments start with #' to distinguish them from regular comments. Here’s some documentation for a simple function:

#' Add together two numbers.
#' 
#' @param x A number.
#' @param y A number.
#' @return The sum of \code{x} and \code{y}.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
  x + y
}

Pressing Cmd + Shift + D (or running devtools::document()) will generate a man/add.Rd that looks like:

% Generated by roxygen2 (4.0.0): do not edit by hand
\name{add}
\alias{add}
\title{Add together two numbers}
\usage{
add(x, y)
}
\arguments{
  \item{x}{A number}

  \item{y}{A number}
}
\value{
The sum of \code{x} and \code{y}
}
\description{
Add together two numbers
}
\examples{
add(1, 1)
add(10, 1)
}

If you’re familiar with latex, this should look familiar, since the .Rd format is loosely based on LaTeX. You can read more about the Rd format in the R extensions manual. Note the comment at the top of the file: it was generated by code and shouldn’t be modified. Indeed, if you use roxygen2, you will rarely look at these files.

When you use ?add, help("add"), or example("add"), R looks for an .Rd file containing \alias{"add"}. It then parses the file, converts it into html and displays it. Here’s what the result looks like in RStudio:

(Note that this works because devtools overrides the usual help functions to teach them how to work with source packages. If the documentation doesn’t appear, make sure that you’re using devtools and you’ve loaded the package with load_all().)

Roxygen comments

Roxygen comments start with #' and come before a function. All the roxygen lines preceeding a function are called a block. Each line should be wrapped in the same way as your code, normally at 80 characters.

Blocks are broken up into tags, which look like @tag details. The content of a tag extends from the end of the tag name to the start of the next tag (or the end of the block). Because @ has a special meaning in roxygen, you need to write @@ if you want to add a literal @ to the documentation (this is mostly important for email addresses and for accessing slots of S4 objects).

Each block include some text before the first tag. This is called the introduction, and is parsed specially:

  • The first sentence becomes the title of the documentation. That’s what you see when you look at help(package = mypackage) and is shown at the top of each help file. It should fit on one line, be written in sentence case, and end in a full stop.

  • The second paragraph is the description: this comes first in the documentation and should briefly describe what the function does.

  • The third and subsequent paragraphs go into the details: this is a (often long) section that comes after the argument description and should go into detail about how the function works.

All objects must have a title and description. Details are optional.

Here’s an example showing what the introduction for sum() might look like if it had been written with roxygen:

#' Sum of vector elements.
#' 
#' \code{sum} returns the sum of all the values present in its arguments.
#' 
#' This is a generic function: methods can be defined for it directly or via the
#' \code{\link{Summary}} group generic. For this to work properly, the arguments
#' \code{...} should be unnamed, and dispatch is on the first argument.
sum <- function(..., na.rm = TRUE) {}

\code{} and \link{} are formatting commands that you’ll learn about in formatting. Note that I’ve been careful to wrap the roxygen block so that it’s less than 80 characters wide. You can do that automatically in Rstudio by pushing using Cmd + Shift + / (code | re-flow comment).

You can add arbitrary sections to the documentation for any object with the @section tag. This is a useful way of breaking a long details section into multiple chunks with useful headings. Section titles should be in sentence case and must be followed a colon. Titles may only take one line.

#' @section Warning:
#' Do not operate heavy machinery within 8 hours of using this function.

There are two tags that make it easier for people to navigate around your documentation:

  • @seealso allows you to point to other useful resources, either on the web, \url{http://www.r-project.org}, or to other documentation in your package \code{\link{functioname}}.

  • If you have a family of related functions where every function should link to every other function in the family, use @family. The value of @family should be plural.

For sum, these components might look like:

#' @family aggregate functions
#' @seealso \code{\link{prod}} for products, \code{\link{cumsum}} for cumulative
#'   sums, and \code{\link{colSums}}/\code{\link{rowSums}} marginal sums over
#'   high-dimensional arrays.

Three other tags make it easier for the user to find documentation:

  • @aliases space separated aliases adds additional aliases to the topic. An alias is another name of the topic that can be used with ?.

  • @concepts adds extra keywords that will be found with help.search().

  • @keywords keyword1 keyword2 ... adds standardised keywords. Keywords are optional, but if present, must be taken from a predefined list found in file.path(R.home("doc"), "KEYWORDS").

    Generally, keywords are not very useful except for @keywords internal. Using the internal keyword removes all functions in the associated .Rd file from the documentation index and disables some of their automated tests. A common use case is to both export a function (using @export) and mark it as internal. That way, advanced users can access a function that new users would be confused about if they were to see it in the index.

Other tags are situational: they vary based on the type of object that you’re documenting. The following sections describe the most commonly used tags for functions and the various methods, generics and objects used by R’s three OO systems. You’ll learn how to document datasets in documenting data.

Documenting functions

Functions are the most commonly documented objects. As well as the introduction block, most functions have three tags:

  • @param name description describes the inputs to the function. The description should provide a succinct summary of the type of the parameter (e.g. a string, a numeric vector), and if not obvious from the name, what the parameter does.

    The description should start with a capital letter and end with a full stop. It can span multiple lines (or even paragraphs) if necessary. All parameters must be documented.

    You can document multiple arguments in one place by separating the names with commas (no spaces). For example, to document both x and y, you can say @param x,y Numeric vectors..

  • @examples provides executable R code showing how to use the function in practice. This is a very important part of the documentation because many people look at the examples first. Example code must work without errors as it is run automatically as part of R CMD check.

    For the purpose of illustration, it’s often useful to include code that causes an error. \dontrun{} allows you to include code in the example that is never used. There are two other special commands. \dontshow{} is run, but not shown in the help page: this can be useful for informal tests. \donttest{} is run in examples, but not run automatically in R CMD check. This is useful if you have examples that take a long time to run. The options are summarised below.

    Command example help R CMD check
    \dontrun{}
    \dontshow{}
    \donttest{}

    Instead of including examples directly in the documentation, you can put them in separate files and use @example path/relative/to/packge/root to insert them into the documentation.

  • @return description describes the output from the function. This is not always necessary, but is a good idea if you return different types of outputs depending on the input, or you’re returning an S3, S4 or RC object.

We could use these new tags to improve our documentation of sum() as follows:

#' Sum of vector elements.
#'
#' \code{sum} returns the sum of all the values present in its arguments.
#'
#' This is a generic function: methods can be defined for it directly
#' or via the \code{\link{Summary}} group generic. For this to work properly,
#' the arguments \code{...} should be unnamed, and dispatch is on the
#' first argument.
#'
#' @param ... Numeric, complex, or logical vectors.
#' @param na.rm A logical scalar. Should missing values (including NaN)
#'   be removed?
#' @return If all inputs are integer and logical, then the output
#'   will be an integer. If integer overflow
#'   \url{http://en.wikipedia.org/wiki/Integer_overflow} occurs, the output
#'   will be NA with a warning. Otherwise it will be a length-one numeric or
#'   complex vector.
#'
#'   Zero-length vectors have sum 0 by definition. See
#'   \url{http://en.wikipedia.org/wiki/Empty_sum} for more details.
#' @examples
#' sum(1:10)
#' sum(1:5, 6:10)
#' sum(F, F, F, T, T)
#'
#' sum(.Machine$integer.max, 1L)
#' sum(.Machine$integer.max, 1)
#'
#' \dontrun{
#' sum("a")
#' }
sum <- function(..., na.rm = TRUE) {}

Indent the second and subsequent lines of a tag so that when scanning the documentation it’s easy to see where one tag ends and the next begins. Tags that always span multiple lines (like @example) should start on a new line and don’t need to be indented.

Documenting classes, generics and methods

Documenting classes, generics and methods are relatively straightforward, but the details vary based on the object system you’re using. The following sections give the details for the S3, S4 and RC object systems.

S3

S3 generics are regular functions, so document them as such. S3 classes have no formal definition, so document the constructor function. It is your choice whether or not to document S3 methods. You don’t need to document methods for simple generics like print(). But if your method is more complicated, you should document it so people know what the parameters do. In base R, you can see examples of documentation for more complex methods like predict.lm(), predict.glm(), and anova.glm().

Older versions of roxygen required explicit @method generic class tags for all S3 methods. From 3.0.0 this is no longer needed as and roxygen2 will figure it out automatically. If you are upgrading, make sure to remove these old tags. Automatic method detection will only fail if the generic and class are ambiguous. For example is all.equal.data.frame() the equal.data.frame method for all, or the data.frame method for all.equal?. If this happens, you can disambiguate with (e.g.) @method all.equal data.frame.

S4

Document S4 classes by adding a roxygen block before setClass(). Use @slot to document the slots of the class in the same way you use @param to describe the parameters of a function. Here’s a simple example:

#' An S4 class to represent a bank account.
#'
#' @slot balance A length-one numeric vector
Account <- setClass("Account",
  slots = list(balance = "numeric")
)

S4 generics are also functions, so document them as such. S4 methods are a little more complicated. Unlike S3, all S4 methods must be documented. You document them like a regular function, but you probably don’t want each method to have it’s own documentation page. Instead, put the method documentation in one of three places:

  • In the class. Most appropriate if the corresponding generic uses single dispatch and you created the class.

  • In the generic. Most appropriate if the generic uses multiple dispatch and you have written both the generic and the method.

  • In its own file. Most appropriate if the method is complex, or the you’ve written the method but not the class or generic.

Use either @rdname or @describeIn to control where method documentation goes. See documenting multiple objects in one file for details.

A final consideration of S4 is that S4 code needs to run in a certain order. For example, to define the method setMethod("foo", c("bar", "baz"), ...) you must already have created the generic and the two classes. R loads files in alphabetical order. Unfortunately not every alphabet puts letters in the same order, so you can’t rely on alphabetic ordering if you need one file loaded before another. The order in which files are loaded doesn’t matter for most packages. But if you’re using S4, you’ll need to make sure that classes are loaded before subclasses and generics are defined before methods.

Rather than relying on alphabetic ordering, roxygen2 provides an explicit way of saying that one file must be loaded before another: @include. The @include tag gives a space separated list of file names that should be loaded before the current file:

#' @include class-a.r
setClass("B", contains = "A")

If any @include tags are present in the package, roxygen2 will set the Collate field in the DESCRIPTION, which ensures that files are always loaded in the same order. An simpler alternative to @include is to define all classes and methods in aaa-classes.R and aaa-generics.R, and rely on these coming first in alphabetical order. The main disadvantage is that you can organise components into files as naturally as you might want.

Older versions of roxyen2 required explicit @usage, @alias and @docType tags to correctly document S4 objects, but as of version 3.0.0 it generates correct metadata automatically. If you’re upgrading from a old version of roxygen2, make sure to remove these tags.

RC

Reference classes are different to S3 and S4 because methods are associated with classes, not generics. RC also has a special convention for documenting methods: the docstring. This makes documenting RC simpler than S4 because you only need one roxygen block per class.

#' A Reference Class to represent a bank account.
#'
#' @field balance A length-one numeric vector.
Account <- setRefClass("Account",
  fields = list(balance = "numeric"),
  methods = list(
    withdraw = function(x) {
      "Withdraw money from account. Allows overdrafts"
      balance <<- balance - x
    }
  )
)

Methods with doc strings will be included in the “Methods” section of the class documentation. Each documented method will be listed with an automatically generated usage statement and its doc string. Also note the use of @field instead of @slot.

Do repeat yourself

There is a tension between the DRY (do not repeat yourself) principle of programming and the need for documentation to be self-contained. It’s frustrating to have to navigate through multiple help files in order to pull together all the pieces you need. Roxygen2 provides two ways to avoid repeating yourself in the source, while still assembling everything into one documentation file:

  • Reuse parameter documentation with @inheritParams.
  • Document multiple functions in the same place with @describeIn or @rdname

Inheriting parameters from other functions

You can inherit parameter descriptions from other functions using @inheritParams source_function. This tag will bring in all documentation for parameters that are undocumented in the current function, but documented in the source function. The source can be a function in the current package, @inheritParams function, or another package using @inheritParams package::function. For example the following documentation:

#' @param a This is the first argument
foo <- function(a) a + 10

#' @param b This is the second argument
#' @inheritParams foo
bar <- function(a, b) {
  foo(a) * 10
}

is equivalent to

#' @param a This is the first argument
#' @param b This is the second argument
bar <- function(a, b) {
  foo(a) * 10
}

Note, however, that inheritance does not chain. In other words, the source_function must always be the function that defines the parameter using @param.

Documenting multiple functions in the same file

You can document multiple functions in the same file by using either @rdname or @describeIn tag. It’s a technique best used with caution: documenting too many functions into one place leads to confusing documentation. It’s best used when all functions have the same (or very similar) arguments.

@describeIn is designed for the most common cases:

  • documenting methods in a generic
  • documenting methods in a class
  • documenting functions with the same (or similar arguments)

It generates a new section, named either “Methods (by class)”, “Methods (by generic)” or “Functions”. The section contains a bulleted list describing each function, labelled so that you know what function or method it’s talking about. Here’s an example, documenting an imaginary new generic:

#' Foo bar generic
#'
#' @param x Object to foo.
foobar <- function(x) UseMethod("x")

#' @describeIn foobar Difference between the mean and the median
foobar.numeric <- function(x) abs(mean(x) - median(x))

#' @describeIn foobar First and last values pasted together in a string.
foobar.character <- function(x) paste0(x[1], "-", x[length(x)])

An alternative to @describeIn is @rdname. It overrides the default file name generated by roxygen and merges documentation for multiple objects into one file. This gives you complete freedom to combine documentation however you see fit. There are two ways to use @rdname. You can add documentation to an existing function:

#' Basic arithmetic
#'
#' @param x,y numeric vectors.
add <- function(x, y) x + y

#' @rdname add
times <- function(x, y) x * y

Or, you can create a dummy documentation file by documenting NULL and setting an informative @name.

#' Basic arithmetic
#'
#' @param x,y numeric vectors.
#' @name arith
NULL

#' @rdname arith
add <- function(x, y) x + y

#' @rdname arith
times <- function(x, y) x * y

Text formatting reference sheet

Within roxygen tags, you use .Rd syntax to format text. This vignette shows you examples of the most important commands. The full details are described in R extensions.

Note that \ and % are special characters. To insert literals, escape with a backslash: \\, \%.

Character formatting

  • \emph{italics}

  • \strong{bold}

  • \code{r_function_call(with = "arguments")}, \code{NULL}, \code{TRUE}

  • \pkg{package_name}

Links

To other documentation:

  • \code{\link{function}}: function in this package

  • \code{\link[MASS]{stats}}: function in another package

  • \link[=dest]{name}: link to dest, but show name

  • \linkS4class{abc}: link to an S4 class

To the web:

  • \url{http://rstudio.com}

  • \href{http://rstudio.com}{Rstudio}

  • \email{hadley@@rstudio.com} (note the doubled @)

Lists

  • Ordered (numbered) lists:

    #' \enumerate{
    #'   \item First item
    #'   \item Second item
    #' }
  • Unordered (bulleted) lists

    #' \itemize{
    #'   \item First item
    #'   \item Second item
    #' }
  • Definition (named) lists

    #' \describe{
    #'   \item{One}{First item}
    #'   \item{Two}{Second item}
    #' }

Mathematics

Standard LaTeX (with no extensions):

  • \eqn{a + b}: inline equation

  • \deqn{a + b}: display (block) equation

Tables

Tables are created with \tabular{}. It has two arguments:

  1. Column alignment, specified by letter for each column (l = left, r = right, c = centre.)

  2. Table contents, with columns separated by \tab and rows by \cr.

The following function turns an R data frame into into the correct format. It ignores column and row names, but should get you started.

tabular <- function(df, ...) {
  stopifnot(is.data.frame(df))

  align <- function(x) if (is.numeric(x)) "r" else "l"
  col_align <- vapply(df, align, character(1))

  cols <- lapply(df, format, ...)
  contents <- do.call("paste",
    c(cols, list(sep = " \\tab ", collapse = "\\cr\n  ")))

  paste("\\tabular{", paste(col_align, collapse = ""), "}{\n  ",
    contents, "\n}\n", sep = "")
}

cat(tabular(mtcars[1:5, 1:5]))
#> \tabular{rrrrr}{
#>   21.0 \tab 6 \tab 160 \tab 110 \tab 3.90\cr
#>   21.0 \tab 6 \tab 160 \tab 110 \tab 3.90\cr
#>   22.8 \tab 4 \tab 108 \tab  93 \tab 3.85\cr
#>   21.4 \tab 6 \tab 258 \tab 110 \tab 3.08\cr
#>   18.7 \tab 8 \tab 360 \tab 175 \tab 3.15
#> }