R packages by Hadley Wickham

Package metadata

Naming your package

When creating a package the first thing (and sometimes the most difficult) is to come up with a name for it. There’s only one formal requirement:

  • The package name can only consist of letters and numbers, and must start with a letter.

But I have a few additional recommendations:

  • Make the package name googleable, so that if you google the name you can easily find it. This makes it easy for potential users to find your package, and it’s also useful for you, because it makes it easier to find out who is using it.

  • Avoid using both upper and lower case letters: they make the package name hard to type and hard to remember. For example, I can never remember if it’s Rgtk2 or RGTK2 or RGtk2.

Some strategies I’ve used in the past to create packages names:

  • Find a name evocative of the problem and modify it so that it’s unique: plyr (generalisation of apply tools), lubridate (makes dates and times easier), mutatr (mutable objects), classifly (high-dimensional views of classification).

  • Use abbreviations: lvplot (letter value plots), meifly (models explored interactively).

  • Add an extra R: stringr (string processing), tourr (grand tours), httr (HTTP requests).

Once you have a name, create a directory with that name, and inside that create an R subdirectory and a DESCRIPTION file (note that there’s no extension, and the file name must be all upper case).

A minimal DESCRIPTION file

A minimal description file (this one is taken from an early version of plyr) looks like this:

Package: plyr
Title: Tools for splitting, applying and combining data
Version: 0.1
Author: Hadley Wickham <h.wickham@gmail.com>
Maintainer: Hadley Wickham <h.wickham@gmail.com>
License: MIT

This is the critical subset of package metadata: what it’s called (Package), what it does (Title, Description), who’s allowed to use and distribute it (License), who wrote it (Author), and who to contact if you have problems (Maintainer). Here I’ve left the Description blank to illustrate that if you haven’t decided what the correct value is yet, it’s ok to leave it blank.

Again, the six required elements are:

  • Package: name of the package. Should be the same as the directory name.

  • Title: a one line description of the package.

  • Description: a more detailed paragraph-length description.

  • Version: the version number, which should be of the the form major.minor.patchlevel. See ?package_version for more details on the package version formats. I recommended following the principles of semantic versioning.

  • Maintainer: a single name and email address for the person responsible for package maintenance.

  • License: a standard abbreviation for an open source license, like GPL-2 or BSD. A complete list of possibilities can be found by running file.show(file.path(R.home(), "share/licenses/license.db")). If you are using a non-standard license, put file LICENSE and then include the full text of the license in a LICENSE.

Other DESCRIPTION components

A more complete DESCRIPTION (this one from a more recent version of plyr) looks like this:

Package: plyr
Title: Tools for splitting, applying and combining data
Description: plyr is a set of tools that solves a common set of
    problems: you need to break a big problem down into manageable
    pieces, operate on each pieces and then put all the pieces back
    together.  For example, you might want to fit a model to each
    spatial location or time point in your study, summarise data by
    panels or collapse high-dimensional arrays to simpler summary
    statistics. The development of plyr has been generously supported
    by BD (Becton Dickinson).
URL: http://had.co.nz/plyr
Version: 1.3
Maintainer: Hadley Wickham <h.wickham@gmail.com>
Author: Hadley Wickham <h.wickham@gmail.com>
Depends: R (>= 2.11.0)
Suggests: abind, testthat (>= 0.2), tcltk, foreach
Imports: itertools, iterators
License: MIT

This DESCRIPTION includes other components that are optional, but still important:

  • Depends, Suggests, Imports and Enhances describe which packages this package needs. They are described in more detail in [[namespaces]].

  • URL: a url to the package website. Multiple urls can be separated with a comma or whitespace.

Instead of Maintainer and Author, you can Authors@R, which takes a vector of person() elements. Each person object specifies the name of the person and their role in creating the package:

  • aut: full authors who have contributed much to the package

  • ctb: people who have made smaller contributions, like patches.

  • cre: the package creator/maintainer, the person you should bother if you have problems

Other roles are listed in the help for person. Using Authors@R is useful when your package gets bigger and you have multiple contributors that you want to acknowledge appropriately. The equivalent Authors@R syntax for plyr would be:

  Authors@R: person("Hadley", "Wickham", role = c("aut", "cre"))

There are a number of other less commonly used fields like BugReports, KeepSource, OS_type and Language. A complete list of the DESCRIPTION fields that R understands can be found in the R extensions manual.

Collation order

R loads files in alphabetical order. Unfortunately not every alphabet puts letters in the same order, so you can’t rely on alphabetic ordering if you need one file loaded before another. The order in which files are loaded doesn’t matter for most packages. But if you’re using S4, you’ll need to make sure that classes are loaded before subclasses and generics are defined before methods.

Rather than relying on alphabetic ordering, roxygen2 provides an explicit way of saying that one file must be loaded before another: @include. The @include tag gives a space separated list of file names that should be loaded before the current file:

#' @include class-a.r
setClass("B", contains = "A")

If any @include tags are present in the package, roxygen2 will set the Collate field in the DESCRIPTION, which ensures that files are always loaded in the same order.