R packages by Hadley Wickham


Namespace

The package namespace (as recorded in the NAMESPACE file) is one of the more confusing parts of building a package. It’s a fairly advanced topic, and by-and-large, not that important if you are only developing packages for yourself. However, understanding namespaces is vital if you plan to submit your package to CRAN because they ensure that your package plays well with others.

When you first start using namespaces, you’ll wonder why you bother. It seems like a lot of extra work for little gain. However, a high quality namespace helps encapsulate your package and make it self-contained. A self-contained package is a good goal to strive for because it ensures that other packages don’t interfere with your code, and your code doesn’t interfere with other packages. A self-contained package works regardless of the environment in which it’s run.

Motivation

As the name suggests, namespaces provide “spaces” for “names”. They provide a context for looking up the value of an object associated with a name.

You’ve probably already used namespaces in your data analysis. Have you used :: before? It disambiguates functions with the same name. For example, both plyr and Hmisc provide a summarize() function. If you load plyr, then Hmisc, summarize() will refer to the HMisc version; if you load the packages in the opposite order, then summarize() will refer to the plyr version. This is obviously confusing, so instead, you can be explicit and refer to Hmisc::summarize() or plyr::summarize(). Then it doesn’t matter the order in which the packages were loaded.

In package development, namespaces help your packages be self contained in two ways. The namespace imports defines how a function in a package finds a function in another package. What happens if someone changes the definition of a function that you rely on? For example, take the simple nrow() function in base R:

nrow
#> function (x) 
#> dim(x)[1L]
#> <bytecode: 0x2506bc8>
#> <environment: namespace:base>

It’s defined in terms of dim(), so what happens if we override dim() with our own definition? Does nrow() break?

dim <- function(x) c(1, 1)
dim(mtcars)
#> [1] 1 1
nrow(mtcars)
#> [1] 32

Suprisingly, it does not! That’s because when nrow() looks for an object called dim(), it uses the package namespace, so it finds dim() in the base environment, not the dim() we created in the global environment.

Namespaces also help your package avoid conflicts with other packages through the exports. The exports control which functions are available outside of your package. Internal functions are available only within your package - they can’t (easily) be used from another package. You export a minimal set of functions to be used by others; the fewer functions you export, the smaller the chance of a conflict with another package. (Conflicts aren’t the end of the world, because you can always use :: to disambiguate but they’re best avoided where possible because it makes the lives of your users easier.)

Search path

To understand why namespaces are important, you need a solid understanding of the search path. To call a function, R first has to find it. R first looks in the global environment, and if it doesn’t find it there, it looks on the search path, the list of all the packages you have attached. You can see that list by running search(). For example, here’s the search path for the code in this book:

search()
#>  [1] ".GlobalEnv"        "package:methods"   "package:bookdown" 
#>  [4] "package:rmarkdown" "package:stats"     "package:graphics" 
#>  [7] "package:grDevices" "package:utils"     "package:datasets" 
#> [10] "Autoloads"         "package:base"

There’s an important difference between loading a package and attaching it. Normally you talk about loading a package with library(), but what you’re actually doing is attaching it:

  • Loading a package loads the code, data and any DLLs, it registers S3 and S4 methods, and runs the .onLoad() function, if present. After loading, the package is in memory, and available, but you can’t access its components without using ::. A package is loaded by ::, or you can load it explicitly with requireNamespace() or loadNamespace().

  • Attaching a package makes it available on the search path. You can attach a package with library() or require(), which always first loads the package. Attaching a package adds it to the search path and runs the .onAttach() function, if present. Every attached package is on the search path and is listed by search().

When you use :: it loads the package, but does not attach it.. Notice the difference between these two ways of running expect_that() from the testthat package. If we use library(), testthat is attached to the search path. If we use ::, testthat it’s not.

old <- search()
testthat::expect_equal(1, 1)
setdiff(search(), old)
#> character(0)
expect_true(TRUE)
#> Error in eval(expr, envir, enclos): could not find function "expect_true"
    
library(testthat)
expect_equal(1, 1)
setdiff(search(), old)
#> [1] "package:testthat"
expect_true(TRUE)

There are four functions that make a package available. They differ based on whether they load or attach, and what happens if the package is not found: does it throw an error, or does it return FALSE?

Returns FALSE Throws error
Load requireNamespace("x", quietly = TRUE) loadNamespace("x")
Attach require(x, quietly = TRUE) library(x)

Of the four functions, you should only ever use two:

  • Use library(x) in data analysis scripts. It will throw an error if the package is not installed, and will terminate the script. You want to attach the package to save typing. Never use library() in a package.

  • Use requireNamespace(x, quietly = TRUE) inside a package if you want do something different (e.g. throw an error) depending on whether or not a suggested package is installed.

You never need to use require() (requireNamespace() is almost always better), or loadNamespace() (which is only needed for internal R code). You should never use require() or library() in a package: use the Depends or Imports fields in the DESCRIPTION.

Now’s a good time to come back to an important issue which we glossed over earlier: what’s the difference between Depends and Imports in the DESCRIPTION and when should you use each one? Listing a package in either Depends or Imports ensures that it’s installed when needed. The main difference is that Depends attaches the package, where Imports just loads it. There are no other differences, and the rest of the advice in this chapter applies whether or not the package is in Depends or Imports.

Unless there is a good reason otherwise, you should alway list packages in Imports not Depends. That’s because a good package is self-contained, and minimises changes to the global environment (including the search path). There are a few exceptions:

  • Your package is designed to be used in conjunction with another package. For example, the analogue package builds on top of vegan. It’s not useful without vegan, so it has vegan in Depends instead of Imports. ggplot2 should really Depend on scales, rather than Importing it.

  • The DBI package provides a common set of S4 classes and generics to be used by any package that talks to a database. A new database backend for DBI doesn’t actually implement any functions, it just creates new classes and provides methods. So if DBI is not attached, the backend is useless. That means packages that implement backends should have DBI in Depends and not in Imports.

Now that you understand the importance of the namespace, let’s dive into the nitty gritty details. The two sides of the package namespace, imports and exports are both described by the NAMESPACE. You’ll learn what this file looks like in the next section, and in the following section you’ll learn the details of exporting and importing functions and other objects.

The NAMESPACE

The following code is an excerpt of the NAMESPACE file from the testthat package.

# Generated by roxygen2 (4.0.2): do not edit by hand
S3method(as.character,expectation)
S3method(compare,character)
export(auto_test)
export(auto_test_package)
export(colourise)
export(context)
exportClasses(ListReporter)
exportClasses(MinimalReporter)
importFrom(methods,setRefClass)
useDynLib(testthat,duplicate_)
useDynLib(testthat,reassign_function)

You can see that the NAMESPACE file looks a bit like R code. Each line contains a directive: S3method(), export(), exportClasses(), and so on. Each directive describes an R object, and says whether it’s exported from this package to be used by others, or it’s imported from another package to be used locally.

In total, there are eight namespace directives. Four describe exports:

  • export(): export functions (including S3 and S4 generics).
  • exportPattern(): export all functions that match a pattern.
  • exportClasses(), exportMethods(): export S4 classes and methods.
  • S3method(): export S3 methods.

And four describe imports:

  • import(): import all functions from a package.
  • importFrom(): import selected functions.
  • importClassesFrom(), importMethodsFrom(): import S4 classes and methods.
  • useDynLib(): import a function from C. This is described in more detail in compiled code.

I don’t recommend writing these directives by hand. Instead, in this chapter you’ll learn how to generate the NAMESPACE file with roxygen2. There are three main advantages to using roxygen2:

  • Namespace definitions live next to the function they belong, so it’s easier to see what’s being imported and exported when you read the code.

  • Roxygen2 abstracts away some of the details of the NAMESPACE. For example, you only need to learn one tag, @export, which will automatically generate the right directive for functions, S3 methods, S4 methods and S4 classes.

  • Roxygen2 makes a tidy NAMESPACE for you. It only uses unique directives (so you can repeat them as needed), and it sorts them alphabetically. This is particularly important if you’re using git, as it minimises unimportant differences.

Workflow

Generating the namespace with roxygen2 is just like generating function documentation with roxygen2. You use roxygen2 blocks (starting with #') and tags (starting with @). The workflow is also similar:

  1. Add roxygen comments to your .R files.

  2. Run devtools::document() (or press Cmd + Shift + D in RStudio) to convert roxygen comments to .Rd files.

  3. Look at NAMESPACE and run tests to check that the specification is correct.

  4. Rinse and repeat until the correct functions are exported.

Exports

For a function to be usable outside of your package, you must export it. When you create a new package with devtools::create(), it produces a temporary NAMESPACE that exports everything in your package that doesn’t start with . (a single period). If you’re just working locally, it’s fine to export everything in your package. However, if you’re planning on sharing your package with others, it’s a really good idea to only export the needed functions. This reduces the chances of a conflict with another package.

To export an object, put @export in its roxygen block. For example:

#' @export
foo <- function(x, y, z) {
  ...
}

This will then generate export(), exportMethods(), exportClass() or S3method() depending on the type of the object.

You export functions that you want other people to use. Exported functions must be documented, and you must be cautious when changing their interface — other people are using them! Generally, it’s better to export too little than too much. It’s easy to export things that you didn’t before; it’s hard to stop exporting a function because it might break existing code. Always err on the side of caution, and simplicity. It’s easier to give people more functionality than it is to take away stuff they’re used to.

I believe that packages that have a wide audience should strive to do one thing and do it well. All functions in a package should be related to a single problem (or closely related set of problems), and any functions not related to that purpose should not be exported. For example, most of my packages have a utils.R file that contains many small functions that are useful for me, but aren’t part of the core purpose of the package. I never export these functions.

# Defaults for NULL values
`%||%` <- function(a, b) if (is.null(a)) b else a

# Remove NULLs from a list
compact <- function(x) {
  x[!vapply(x, is.null, logical(1))]
}

That said, if you’re creating a package for yourself, it’s far less important to have a consistent theme. You know what’s in your package. It’s fine to have a local “misc” package that contains a passel of functions that you find useful, but I don’t think you should release such a package more widely.

The following sections describes what you should export if you’re using S3, S4 or RC.

S3

If you want others to be able to create instances of an S3 class, @export the constructor function. S3 generics are just regular R functions, you can @export them like functions.

S3 methods are the most complicated because there are four basic scenarios:

  • A method for an exported generic: export every method.

  • A method for an internal generic: technically, you don’t need to export these methods. However, I recommend exporting every S3 method you write because it’s simpler and makes it less likely that you introduce hard to find bugs. You can use devtools::missing_s3() to list all S3 methods that you’ve forgotten to export.

  • A method for a generic in a required package. You’ll need to import the generic (see below), and export the method.

  • A method for a generic in a suggested package. Namespace directives must refer to available functions, so can not reference suggested packages. It’s possible to use package hooks and code to add this at run-time, but this is sufficiently complicated that I don’t currently recommend it. Instead, you’ll have to architect your package dependencies to avoid this scenario.

S4

S4 classes: if you want others to be able to extend your class, @export it. If you want others to create instances of your class, but not extend it, @export the constructor function, but not the class.

# Can extend and create
#' @export
setClass("A")

# Can extend, but constructor not exported
#' @export
B <- setClass("B")

# Can create, but not extend
#' @export C
C <- setClass("C")

# Can create and extend
#' @export D
#' @exportClass D
D <- setClass("D")

S4 generics: @export if you want the generic to be publicly usable.

S4 methods: you only need to @export methods for generics that you did not define. But @exporting every method is a good idea as it will not cause problems and prevents you from forgetting to export an important method.

RC

The same principles apply as for S4 classes. Note that due to the way that RC is currently implemented, it’s typically impossible for your classes to be extended outside of your package.

Data

Package data doesn’t use the namespace mechanism and should never be exported.

Imports

The NAMESPACE also controls which external functions can be used by your package without qualification with ::.

It’s confusing that both DESCRIPTION (through the Imports field) and the NAMESPACE (through import directives) seem to be involved in imports. Unfortunately this is just a naming confusion. The Imports field really has nothing to do with imports: it just makes sure the package is installed and loaded when your package is. It doesn’t make any functions available without qualification. You need to import functions in exactly the same way regardless of whether or not the package is loaded.

Depends is just a user convenience: if your package is attached, it also attaches all packages listed in Depends. If your package is loaded, packages in Depends are loaded, but not attached, so you need to quality function names with :: or specifically import them.

It is common for packages to be listed in Imports in the DESCRIPTION, but not in the NAMESPACE. In fact, this is the default I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason otherwise, it’s better to be explicit. It’s a little more work to write, but a lot easier to read when you come back to the code in the future. The converse is not true. Every package mentioned in NAMESPACE must also be present the the Imports or Depends fields.

R functions

If you are using just a few functions from another package, the recommended option is to note the package name in the Imports: field of the DESCRIPTION file and call the function(s) explicitly using ::, e.g., pkg::fun().

If you are using functions repeatedly, you can avoid the :: by importing the function with @importFrom pgk fun. This also has a small performance benefit, because :: adds approximately 5 µs to function evaluation time.

Alternatively, if you are repeatedly using many functions from another package, you can import them in one command with @import package. This is the least recommended solution: it makes your code harder to read (because you can’t tell where a function is coming from), and if you @import many packages, the chance of a conflicting function names increases.

Compiled functions

To make C/C++ functions available in R, see compiled code.

S3

S3 generics are just functions, so the same rules for functions apply. S3 methods always travel along with the generic, so as long as you can access the generic (either implicitly or explicitly), the methods will also be available.

S4

To use classes defined in another package: @importClassesFrom package ClassA ClassB .... Place this next to the classes that inherit from the imported classes, or the methods that implement a generic for the imported classes.

To use generics defined in another package: @importMethodsFrom package GenericA GenericB .... Place these next to the methods that use the imported generics.

Since S4 is implemented in the methods package, you also need to make sure that’s available. There is tricky because while the method package is always avaialble on the search path when you’re working interactively in R, it is not automatically loaded by Rscript, a tool for running R from the command line.

  • Pre R 3.2.0: Depends: methods in DESCRIPTION.
    Post R 3.2.0: Imports: methods in DESCRIPTION.

  • Since you’ll being using a lot of functions from methods, you’ll probably also want to import the complete package with:

    #' @imports methods
    NULL

    Or you might just want to import the most commonly used functions:

    #' @importFrom methods setClass setGeneric setMethod setRefClass
    NULL

    It doesn’t matter where these import definitions go, but if you have package level docs, that’s a natural place to put them.