-
Notifications
You must be signed in to change notification settings - Fork 48
Developer Guide
This guide is for you if you need to extend mizer
to meet the needs of your research project.
You will already have read the mizer model description in
vignette("model_description")
and thus be familiar with what mizer can do out of the box.
You now want to implement the extension or modification of the
model required for your research, and for that you need to dive into
the internal workings of mizer. This guide is meant to make
that as easy as possible.
The first thing you should do, even before reading this guide, is to go to https://github.com/sizespectrum/mizer/issues and create a new "issue"" to share your ideas and plans with the mizer community. You may get back valuable feedback and advice. Another way to get in touch with the mizer community is via the size-spectrum modelling Google group.
In this section we describe how to set up your working environment to allow you to easily work with the mizer code. Much of it you may already have in place, so feel free to skip ahead.
Mizer is compatible with R versions 3.1 and later.
If you still need to install R, simply install
the latest version. This guide was prepared with
r R.version.string.
This guide assumes that you will be using RStudio to work with R. There is really no reason not to use RStudio and it makes a lot of things much easier. RStudio develops rapidly and adds useful features all the time and so it pays to upgrade to the latest version frequently. This guide was written with version 1.2.1268.
Mizer is developed using the version control system Git and the code is hosted on GitHub. To contribute to the mizer code, you need to have the Git software installed on your system. On most Linux machines it will be installed already, but on other platforms you need to install it. You do not need to install any GUI for git because RStudio has built-in support for git. A good place to learn about using Git and GitHub is this chapter in the guide by Hadley on R package development.
To work with the code you will create your own git repository with a copy of the mizer code. Go to https://github.com/sizespectrum/mizer and fork it into your own repository by clicking the "Fork" button.
You will be prompted to log in to GitHub. If you do not yet have an account, you need to create one for yourself.
Once you have your own fork of the mizer repository, you create a local copy on whichever machine you work on. You can do this from within RStudio. For this you click on the "Project" drop-down and then select "New Project...".
This will bring up a dialog box where you select "Version Control".
Provided you have Git installed and RStudio was able to find it you can then choose "Git" on the next dialog box.
If the Git option is not showing, the you need to troubleshoot, and perhaps https://happygitwithr.com/rstudio-see-git.html helps.
In the next dialog box you let RStudio know where to find your fork.
To find the correct repository URL you go back to GitHub to the front page of the repository that was created when you forked the official mizer repository. There you will find a "Clone or download" button which when clicked will reveal the repository URL. Make sure that you are on the page of your fork of the repository. The URL should contain your GitHub username.
You can copy that to the clipboard by pressing the button next to the URL and then paste it back into the RStudio dialog box. In that dialog box you can also change where RStudio stores the repository on your machine. Choose anywhere convenient. Then click "Create Project".
To work with R packages as a developer, you will need to install additional tools.
Once you have these tools in place, you should install the devtools and roxygen2 packages with
install.packages(c("devtools", "roxygen2"))
You are now all set to develop R packages, and RStudio makes this extra easy. There is even a cheat sheet "Package Development with devtools" accessible from the Help menu in RStudio.
To set things up, click on Build -> More -> Configure Build Tools.
In the resulting dialog box, tick the checkboxes "Use devtools package functions if available" and "Generate documentation with roxygen" and then click on "Configure".
This will open another dialog box where you tick "Install and Restart".
Hit "OK". While you are on the Project Options dialog box, click on "Code Editing" and then check "Insert spaces for tabs" and set "Tab width" to 4, because that is the convention the mizer code follows.
Many useful commands for working with packages and with GitHub are provided by the usethis package, which is automatically installed along with devtools. The package comes with a useful vignette with suggestions for how to set things up most conveniently.
You are now ready to install the mizer package using the development code from GitHub. First you should do this with the command
devtools::install_github("sizespectrum/mizer")
This will automatically also install all the other packages that mizer depends on. However this uses the version of the code in the official mizer repository on GitHub, not your local copy of the code. Once you have made changes to your local code, you will want to install mizer using that code. To do this go to the "Build" tab in RStudio and click on "Install and Restart" or alternatively use the keyboard shortcut Ctrl+Shift+B.
You can watch the progress in the "Build" tab. Once the build has completed, you will see that in the console RStudio automatically runs
library(mizer)
to load you freshly built mizer package. You will want to click "Install and Restart" whenever you have changed your local code.
You will be making your code changes in your fork of the mizer code (we are using the so-called Fork & Pull model). From time to time you will want to interact with the main mizer repository in two ways:
-
You will want to contribute some of your code back to the mizer project, so that it benefits others and also so that it gets automatically included in future releases.
-
You will want to be able to merge new developments made in mizer by others into your code base.
This interaction is made possible with git and GitHub.
It initially takes a bit of effort to get the hang of how this works. Therefore we have created a little tutorial "Working with git and GitHub" with an exercise that will take you through all the necessary steps. Unless you are already very familiar with git and GitHub, it will be worthwhile for you to work through that tutorial now.
We use testthat and
shinytest.
The test are in the directory tests/testthat.
Some tests compare the results of calculations to the results the code gave in
the past, using the testthat::expect_known_value() test. The past values are stored in
tests/testthat/values. If one of the tests gives a value that is different from
the stored value, then the test throws an error and overwrites the stored
value with the new result. The second time the test is run, it then no longer
fails. Luckily the original values will still be in the git repository. So after
you think you have fixed the error that led to the wrong result, you should
revert to the old stored values before re-running the test. Reverting to the old
stored values is easy: Just go to the Git tab in RStudio, select the changed
files in tests/testthat/values (select, not tick), then right-click and choose
Revert.
It may be that the change in the result of a calculation is intended, perhaps because your new code is more accurate than the old code. If you are 100% certain of this, but only then, should you commit the changed files in tests/testthat/values, so that these new values form the basis of future comparison tests.
Plots are tested with the vdiffr package. When a plot has changed, you should run vdiffr::manage_cases(), which will start a shiny gadget where you can view the changes in the plot.
This section is still in an early stage of development.
Mizer is organised in a modular fashion. It is separated into setup functions, simulation functions, and analysis and plotting functions.
There are several different functions for setting up a MizerParams object for
specifying various concrete models. These setup functions make various
simplifying assumptions about the model parameters to reduce the amount of
information that needs to be specified. This usually takes the form of
assuming allometric scaling laws.
The core of mizer is the project() function
that runs a simulation of the size-spectrum model. It takes a specification
of the model contained in an object of type MizerParams and returns the
results of the simulation in an object of type MizerSim.
There are many functions for analysing and plotting the results of a mizer
simulation contained in a MizerSim object.
The MizerParams and MizerSim objects are S4 objects,
meaning that their slots are rigorously defined and are accessed with the '@'
notation. You do not need to learn about S4 classes in order to understand the
mizer code, because the code avoids using S4 methods. In the presentation
below we assume that the MizerParams object is called params and the
MizerSim object is called sim.
Although MizerParams and MizerSim are S4 classes, mizer registers all
their methods as S3 methods rather than S4 methods. For example, the summary
plot for a MizerParams object is defined as plot.MizerParams() and
registered with S3method(plot, MizerParams) in the NAMESPACE, not via
setMethod("plot", "MizerParams", ...). This works because S3 dispatch reads
the class() attribute of an object, which S4 objects have, so
plot(params) correctly finds plot.MizerParams().
The reason for preferring S3 is that calling setMethod() on a function that
is not already an S4 generic silently promotes it to one, changing dispatch
semantics package-wide. For instance, if mizer used setMethod("plot", ...),
then plot would become an S4 generic for all code running in the same R
session. This can cause hard-to-diagnose failures when plot() is called with
objects of mixed S3/S4 types, because S4 dispatch requires an exact method
match on all arguments rather than falling back gracefully to S3 methods.
In short: use S3 methods (named generic.ClassName) for MizerParams and
MizerSim, and avoid setMethod() unless you are extending a generic that is
already S4 in a dependency.
An object of class 'MizerParams' holds all the information needed for the 'project()' function to simulate a model.
If you need to add a new slot to the MizerParams class, you need to make the following additions in the file MizerParams-class.R:
- Go to the section "Class definition" and add a description of you new slot with @slot.
- Add an entry in the
slotslist insidesetClass. - In the function
emptyParams()go to the section "Make object" and inside the call tonew()provide a default value for your slot. If your slot holds an array, then it is conventional in mizer to already give it the correct dimensions and dimnames here, if possible.
What exactly to put into these places is usually clear in analogy to what is already there for other similar slots.
Many functions in mizer — getEncounter(), getFeedingLevel(),
getPredMort(), getMort(), getEReproAndGrowth(), getERepro(),
getEGrowth(), getFMort(), getFlux(), and getCriticalFeedingLevel() —
return an ArraySpeciesBySize object. This is a lightweight S3 class that wraps a
species × size matrix and attaches two extra attributes:
-
value_name: a human-readable label (e.g."Encounter rate"). -
units: the physical units of the rate (e.g."g/year"or"1/year").
An ArraySpeciesBySize object behaves exactly like a regular matrix for subsetting and
arithmetic. Subsetting with [ preserves the class as long as the result is
still a 2-D matrix; arithmetic operators (via Ops.ArraySpeciesBySize) strip the class
and return a plain matrix. This means you can use ArraySpeciesBySize objects
transparently in calculations without worrying about attribute contamination.
enc <- getEncounter(NS_params)
is.ArraySpeciesBySize(enc) # TRUE
enc["Cod", "100"] # still a ArraySpeciesBySize
enc * 2 # plain matrix — class is stripped by Ops.ArraySpeciesBySize
ArraySpeciesBySize provides print(), summary(), plot(), and as.data.frame()
methods designed for quick inspection of rate arrays:
enc <- getEncounter(NS_params)
print(enc) # one-line min/mean/max per species
summary(enc) # tabular summary per species
plot(enc, NS_params) # line plot vs. size, coloured by species
as.data.frame(enc) # long-format data frame with columns w, value, Species
The plot() method accepts a params argument so it can use the species
colours and linetypes stored in the MizerParams object, and it restricts each
species' curve to its natural size range (w_min to w_max) unless
all.sizes = TRUE.
If you write a custom rate function that replaces one of the built-in mizer*
functions (see ?setRateFunction), you do not need to return an ArraySpeciesBySize:
the get* wrapper that calls your function is responsible for wrapping the
result. Your function should return a plain numeric matrix with the correct
dimensions (species × size) and dimnames.
If you write a new standalone rate function that is meant to be called directly by users, you may want to wrap its result yourself:
myRateFunction <- function(params, ...) {
result <- # ... compute a species x size matrix ...
ArraySpeciesBySize(result, value_name = "My rate", units = "1/year", params = params)
}
The params argument to ArraySpeciesBySize() is used only to copy dimnames from
params@metab onto the result, so that species names and size labels are
set consistently.
You can test whether an object is an ArraySpeciesBySize with is.ArraySpeciesBySize().
Mizer should make good use of R's condition handling system, which is well described at https://adv-r.hadley.nz/conditions.html. Currently this is being only in a small number of places in the code. Search the code for signal() and withCallingHandlers() to find those places.
Mizer uses the rlang::signal() function to generate information messages that can be controlled by the user via the info_level argument. This system allows for different levels of verbosity without cluttering the console with warnings or standard messages that cannot be easily suppressed or filtered.
The info_level argument is an integer that controls the amount of information shown to the user. It works by filtering signals based on their assigned level.
-
info_level = 0: No information messages are shown. This is useful for running simulations in loops or when a clean output is desired. -
info_level = 1: Only important messages are shown. These are typically about issues that might affect the validity of the model or significant assumptions being made. -
info_level = 2: Intermediate level (currently not widely used, reserved for future granularity). -
info_level = 3: All information messages are shown. This includes details about default values being set and other minor notifications.
When writing functions in mizer, you should use signal() instead of message() or warning() for information that is not critical (i.e., not an error or a deprecation warning) but helpful for the user to know.
signal(message, class = "info_about_default", var = "variable_name", level = integer_level)-
message: The text to display. -
class: Should be"info_about_default"for messages related to setting default values. -
var: The name of the variable or parameter the message is about (e.g., "h", "gamma", "ks"). -
level: The importance level of the message (1, 2, or 3).
-
Level 1: Use for non-critical but significant issues.
- Example: "Because you have n != p, the default value for
his not very good."
- Example: "Because you have n != p, the default value for
-
Level 3: Use for routine information about defaults.
- Example: "For species where no growth information is available the parameter h has been set to h = 30."
Functions that accept info_level should set up a calling handler to capture and filter these signals.
infos <- list()
collect_info <- function(cnd) {
if (cnd$level <= info_level) {
infos[[cnd$var]] <<- cnd$message
rlang::cnd_muffle(cnd) # So this info will not be repeated higher up
}
}
withCallingHandlers(
info_about_default = collect_info, {
# ... code that might generate signals ...
}
)
if (length(infos) > 0) {
message(paste(infos, collapse = "\n"))
}This pattern collects all relevant messages and prints them at the end, avoiding duplicate messages for the same variable if multiple defaults are set.
Signals in R bubble up the call stack via withCallingHandlers. To ensure a unified summary and avoid duplicate messages, we adopt a Top-Level Reporting strategy:
-
Top-Level Functions: Functions that are directly called by the user (e.g.,
newCommunityParams) should handle signals and print the summary. -
Inner Functions: Functions called by other functions (e.g.,
newMultispeciesParamscalled bynewCommunityParams) should be called withinfo_level = 0. This prevents them from printing their own summary while allowing their signals to bubble up to the top-level function's handler.
For example, newMultispeciesParams calls setParams(params, info_level = 0, ...) so that setParams does not print its own summary; instead, the signals bubble up to newMultispeciesParams's handler.
The mizer functions for creating new models make a lot of choices for default values for parameters that are not provided by the user. Sometimes we find better ways to choose the defaults and update mizer accordingly. When we do this, we will increase the edition number. The mechanism is described in the help page of defaults_edition().
We are currently on default edition r defaults_edition().
To find which defaults are different in different editions simply search for conditionals involving defaults_edition() in the code.
We should try to avoid committing too many very large files to the git repository, because the larger the repository the longer it takes to download. Currently (September 2019) the repository is still at a manageable 110 MiB. The best way to check the size of the repository is with the git-sizer, see https://github.com/github/git-sizer/
We are currently storing the mizer website in the mizer repository (in the docs subdirectory). That is convenient, partly because that is where pkgdown puts it by default and GitHub serves it from there. However in the future we might consider moving the website to its own repository.