To get started, install plotscaper
with:
devtools::install_github("bartonicek/plotscaper")
Next, open up RStudio and run the following code:
library(plotscaper)
httpuv::stopAllServers()
names(airquality) <- c("ozone", "solar radiation", "wind",
"temperature", "month", "day")
create_schema(airquality) |>
add_scatterplot(c("solar radiation", "ozone")) |>
add_barplot(c("day", "ozone"), list(reducer = "max")) |>
add_histogram(c("wind")) |>
add_pcoords(names(airquality)[1:4]) |>
render()
#> Warning in create_schema(airquality): Removed 42 rows with missing values from
#> the data
Try clicking and dragging to select some points in the scatterplot. You should see the corresponding cases get highlighted within the barplot!
There are many other ways interacting with plotscaper
figures, including:
- Zooming and panning
- Changing the size of objects
- Increasing/decreasing the opacity (alpha)
- Manipulating parameters such as histogram binwidth and anchor
- Modifying axis limits and ordering discrete axes
Importantly, many of these interactions can either be done manually, by interacting with the figure (client-side), or programmatically, by calling functions from inside a running R session (“server-side”).
The scene and the schema
To take full advantage of plotscaper
, you need to
understand two core functions: create_schema
and
render
. We’ll explore this on the example of the
penguins
data set (Horst, Hill, and
Gorman 2020).
Creating a schema
The create_schema
function initializes a schema
- a sort of a recipe which we can use to define the figure. Like in
other data visualization packages, we build up the schema step-by-step,
by calling functions that append additional information to it:
library(palmerpenguins)
penguins <- na.omit(penguins) # missing data is not supported yet, unfortunately
names(penguins) <- names(penguins) |> gsub("(_mm|_g)", "", x = _)
schema <- create_schema(penguins) |>
add_scatterplot(c("body_mass", "flipper_length")) |>
add_barplot(c("species")) |>
add_fluctplot(c("species", "sex")) |>
add_histogram(c("bill_length"))
The schema then is really just a list of messages:
schema
#> plotscaper schema:
#> add-plot { type: scatter, variables: c("body_mass", "flipper_length") }
#> add-plot { type: bar, variables: species }
#> add-plot { type: fluct, variables: c("species", "sex") }
#> add-plot { type: histo, variables: bill_length }
Typically, you’ll use the schema to add more plots. However, you can also do other things such as select cases or set axis limits:
schema <- schema |>
assign_cases(which(penguins$species == "Adelie")) |>
set_scale("plot1", "x", min = 0) |>
set_scale("plot1", "size", max = 5)
schema
#> plotscaper schema:
#> add-plot { type: scatter, variables: c("body_mass", "flipper_length") }
#> add-plot { type: bar, variables: species }
#> add-plot { type: fluct, variables: c("species", "sex") }
#> add-plot { type: histo, variables: bill_length }
#> set-assigned { cases: c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
#> 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145), group: 1 }
#> set-scale { scale: x, min: 0 }
#> set-scale { scale: size, max: 5 }
In fact, most of the things you can do with the schema you can also do interactively, and vice versa. We will see that in the next section.
Rendering a scene
When it finally comes a time to turn the recipe into an actual
interactive figure or scene, we can do so by calling the
render
function:
scene <- schema |> render()
scene
The render
function takes the schema and turns it into a
htmlwidgets
widget that we can interact with in RStudio
viewer or embed in RMarkdown documents. However, it can also do a little
bit more than that.
If you’re following along in RStudio, try this:
scene |> select_cases(1:10)
You should see some cases of the data get highlighted. Importantly, notice that the figure does not get re-rendered!
The difference between scene and schema
Whereas the create_schema
merely initializes a recipe
which is just a regular R object (a list
), the
render
function renders the recipe into an
htmlwidgets
widget, and, if inside a running R
session, it also launches an httpuv
server for direct
communication between the R session and the figure.
We can use (mostly) the same functions to modify schema (created by
call to create_schema
) and scene (created by call to
render
). The difference is that while calling functions
with a schema as the first argument merely appends the corresponding
message to the recipe, calling functions with scene immediately sends a
message to the figure via the server.
That is, while the following code:
# NOT RUN
scene <- schema |> select_cases(20:30) |> render()
scene
scene
causes two full re-renders, the following code:
# NOT RUN
scene <- schema |> render()
scene |> select_cases(20:30)
scene |> select_cases(20:30)
causes only one render (if inside a running R session).
In other words, plotscaper
functions such as
add_*
, set_selected
, and
set_scale
behave differently based on whether we call them
on scene or schema:
- Schema: calling a function lazily appends to a list of messages that will get executed in the future, when the schema is rendered
- Scene: calling a function immediately sends a message to the scene and mutates its state
The second method only works if we are in an interactive R session because we need a server to communicate with the figure.
This is why running the following code (while the document you’re reading is being knitted) throws an error:
interactive()
#> [1] FALSE
scene |> select_cases(1:10)
#> Error: You can only send messages to scene from within an interactive R session
When we knit an RMarkdown document, we generate a static HTML file.
By default, we cannot communicate with this file since it’s just a big
blob of HTML, CSS, and JavaScript. To only way to change the file is to
rewrite it. Thus, in RMarkdown, we can really only really write the
schema and render
- we can’t do anything with the figure
once it’s been rendered.
In contrast, inside an interactive R session, we can launch a server that will listen and respond to messages from the R session and send them to the figure (and also send messages from the figure back to the R session - this is done via WebSockets).
This also means there are some functions that it only make sense to use inside a running R session. For example, the following functions query the selection status of the figure:
scene |> selected_cases()
scene |> assigned_cases()
That means that you can, for example, render a figure, select some
cases of the data using a mouse, and then call
scene |> selected_cases()
to get the row indices of
those cases. This doesn’t really make sense when writing a schema - the
only way to select cases is via an explicit call to
select_cases
, so we would be querying cases that we have
already specified before.
Likewise, there are functions such as pop_plot
and
remove_plot
which can be used to remove plots from scene.
These don’t really make sense to use while writing a schema - if you
know that you don’t want a specific plot, you can just delete the line
which adds it to the recipe. However, they do make sense when
interacting with a scene inside a running R session. Perhaps you found
some interesting trend in your data and want to see if it holds in other
plots, but you’re running out of space in the viewer - you can
pop_plot
to remove the last plot and add_*
to
add a new plot, all the while keeping the rest of the state of the
figure intact!
Troubleshooting the scene
Sometimes, when rendering multiple figures quickly, the figure may fail to connect to the server or the server may failed to be launched altogether. Often, the cause is that the default port number is already taken. I will try to fix this bug in the future, however, for the time being, you can always fix it by running either of the following functions:
start_server(random_port = TRUE) # Starts a server on a new random port
or:
httpuv::stopAllServers() # Stops all servers, now you should be able to relaunch the server
Bonus: Scatterplot matrix
Since the schema is lazy, we can use it to generate figures programmatically. For example, here’s how we could create an interactive scatterplot matrix (SPLOM) of the penguins data set:
schema <- create_schema(penguins)
keys <- names(penguins)[4:6]
# Loop through combinations of columns
for (i in seq_along(keys)) {
for (j in seq_along(keys)) {
# Add a scatterplot if row & column no.'s are different
if (i != j) schema <- schema |> add_scatterplot(c(keys[i], keys[j]))
# Add a histogram if row & column no.'s match
else schema <- schema |> add_histogram(c(keys[i]))
}
}
# Options to make the plots fit better within the available space
opts <- list(size = 5, axis_title_size = 0.75, axis_label_size = 0.5)
schema |> render(options = opts)
Just to re-iterate the point from the previous section, we could also do this interactively, by writing out the calls to add the plots ourselves:
scene <- create_schema(penguins) |> render(opts)
scene |> add_histogram(c("bill_depth"))
scene |> add_scatterplot(c("bill_depth", "flipper_length"))
scene |> add_scatterplot(c("bill_depth", "body_mass"))
...
However, having to do this for all nine plots might quickly get tedious.
As such, there are different reasons why you might want to do something using a scene, a schema, or some combination of both. If you’re writing an RMarkdown document, you don’t really have a choice - you can’t do anything to a scene once it’s rendered.
Inside an interactive R session, you have more options. You can
decide you first want to create a highly customized schema and only then
start interacting with the figure live. Or you can just immediately fire
off scene <- create_schema(data) |> render()
and do
everything interactively.
Each approach has some advantages and some disadvantages. With the
schema way, you can always re-create most of the state, so if you mess
up and need to go back, you can just print scene
and you’re
good to go. With the interactive scene way, you’re more flexible and you
have the immediate feedback in seeing how the figure changes in front of
you, however, it may be more difficult to recover some state if there
are many intermediate steps.