Saturday, May 31, 2025

Posit AI Weblog: pins 0.4: Versioning

A brand new model of pins is out there on CRAN immediately, which provides help for versioning your datasets and DigitalOcean Areas boards!

As a fast recap, the pins package deal permits you to cache, uncover and share sources. You should utilize pins in a variety of conditions, from downloading a dataset from a URL to creating advanced automation workflows (be taught extra at pins.rstudio.com). You may as well use pins together with TensorFlow and Keras; as an illustration, use cloudml to coach fashions in cloud GPUs, however somewhat than manually copying information into the GPU occasion, you may retailer them as pins immediately from R.

To put in this new model of pins from CRAN, merely run:

You could find an in depth listing of enhancements within the pins NEWS file.

For instance the brand new versioning performance, let’s begin by downloading and caching a distant dataset with pins. For this instance, we’ll obtain the climate in London, this occurs to be in JSON format and requires jsonlite to be parsed:

library(pins)

weather_url <- "https://samples.openweathermap.org/information/2.5/climate?q=London,uk&appid=b6907d289e10d714a6e88b30761fae22"

pin(weather_url, "climate") %>%
  jsonlite::read_json() %>%
  as.information.body()
  coord.lon coord.lat climate.id climate.major     climate.description climate.icon
1     -0.13     51.51        300      Drizzle gentle depth drizzle          09d

One benefit of utilizing pins is that, even when the URL or your web connection turns into unavailable, the above code will nonetheless work.

However again to pins 0.4! The brand new signature parameter in pin_info() permits you to retrieve the “model” of this dataset:

pin_info("climate", signature = TRUE)
# Supply: native (information)
# Signature: 624cca260666c6f090b93c37fd76878e3a12a79b
# Properties:
#   - path: climate

You possibly can then validate the distant dataset has not modified by specifying its signature:

pin(weather_url, "climate", signature = "624cca260666c6f090b93c37fd76878e3a12a79b") %>%
  jsonlite::read_json()

If the distant dataset modifications, pin() will fail and you may take the suitable steps to simply accept the modifications by updating the signature or correctly updating your code. The earlier instance is beneficial as a method of detecting model modifications, however we would additionally wish to retrieve particular variations even when the dataset modifications.

pins 0.4 permits you to show and retrieve variations from companies like GitHub, Kaggle and RStudio Join. Even in boards that don’t help versioning natively, you may opt-in by registering a board with variations = TRUE.

To maintain this easy, let’s deal with GitHub first. We are going to register a GitHub board and pin a dataset to it. Discover that you may additionally specify the commit parameter in GitHub boards because the commit message for this modification.

board_register_github(repo = "javierluraschi/datasets", department = "datasets")

pin(iris, identify = "versioned", board = "github", commit = "use iris as the primary dataset")

Now suppose {that a} colleague comes alongside and updates this dataset as effectively:

pin(mtcars, identify = "versioned", board = "github", commit = "slight choice to mtcars")

Any longer, your code could possibly be damaged or, even worse, produce incorrect outcomes!

Nevertheless, since GitHub was designed as a model management system and pins 0.4 provides help for pin_versions()we will now discover specific variations of this dataset:

pin_versions("versioned", board = "github")
# A tibble: 2 x 4
  model created              writer         message                     
                                                      
1 6e6c320 2020-04-02T21:28:07Z javierluraschi slight choice to mtcars 
2 01f8ddf 2020-04-02T21:27:59Z javierluraschi use iris as the primary dataset

You possibly can then retrieve the model you have an interest in as follows:

pin_get("versioned", model = "01f8ddf", board = "github")
# A tibble: 150 x 5
   Sepal.Size Sepal.Width Petal.Size Petal.Width Species
                                   
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# … with 140 extra rows

You possibly can comply with comparable steps for RStudio Join and Kaggle boards, even for present pins! Different boards like Amazon S3, Google Cloud, Digital Ocean and Microsoft Azure require you explicitly allow versioning when registering your boards.

To check out the brand new DigitalOcean Areas board, first you’ll have to register this board and allow versioning by setting variations to TRUE:

library(pins)
board_register_dospace(house = "pinstest",
                       key = "AAAAAAAAAAAAAAAAAAAA",
                       secret = "ABCABCABCABCABCABCABCABCABCABCABCABCABCA==",
                       datacenter = "sfo2",
                       variations = TRUE)

You possibly can then use all of the performance pins gives, together with versioning:

# create pin and change content material in digitalocean
pin(iris, identify = "versioned", board = "pinstest")
pin(mtcars, identify = "versioned", board = "pinstest")

# retrieve variations from digitalocean
pin_versions(identify = "versioned", board = "pinstest")
# A tibble: 2 x 1
  model
    
1 c35da04
2 d9034cd

Discover that enabling variations in cloud companies requires further space for storing for every model of the dataset being saved:

To be taught extra go to the Versioning and DigitalOcean articles. To meet up with earlier releases:

Thanks for studying alongside!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles