data:image/s3,"s3://crabby-images/57cdd/57cddad9b681205298a14f37c4ff4d6881eebf6a" alt="Scatter chart. Predictor along the x-axis and outcome along the y-axis. Points scatter around a constant low value of the target. At regular intervals, the curve swings up and back down."
64 Periodic Splines
64.1 Periodic Splines
This chapter expands the idea we saw in the last chapter with the use of splines. These splines allow for different shapes of activation than we saw with trigonomic functions.
We will be using the same toy data set and see if we can improve on it.
data:image/s3,"s3://crabby-images/57cdd/57cddad9b681205298a14f37c4ff4d6881eebf6a" alt="Scatter chart. Predictor along the x-axis and outcome along the y-axis. Points scatter around a constant low value of the target. At regular intervals, the curve swings up and back down."
This data has a very specific shape, and we will see if we can approcimate it with our splines.
First we fit a number of spline terms to our data using default arguments.
data:image/s3,"s3://crabby-images/5a782/5a782322415b507d67baa0f14c54231c36a3d841" alt="Two charts one above another. Below: Scatter chart. Predictor along the x-axis and outcome along the y-axis. Points scatter around a constant low value of the target. At regular intervals, the curve swings up and back down. Above: Spline terms as curves. One curve for each term. Each curse goes up in an almost sine curve, and the back down to zero. Each peak is seperated."
While it produces some fine splines they are neither well fitting or periodic. Let us make spline periodic and try to approcimate the period.
data:image/s3,"s3://crabby-images/0e078/0e0786c92652a953b40dea06a896b812133be659" alt="Two charts one above another. Below: Scatter chart. Predictor along the x-axis and outcome along the y-axis. Points scatter around a constant low value of the target. At regular intervals, the curve swings up and back down. Above: Spline terms as curves. One curve for each term. Each curse goes up in an almost sine curve, and the back down to zero. Each peak is seperated. The splines have been shortened and repeating with the same period."
We already see that something good is happening, The width of each bump is related to the number of degrees of freedom we have, lowering this value creates more wider bumps.
data:image/s3,"s3://crabby-images/1fbcd/1fbcd08e6af07f82780fa48124b7f7ffde3620e0" alt="Two charts one above another. Below: Scatter chart. Predictor along the x-axis and outcome along the y-axis. Points scatter around a constant low value of the target. At regular intervals, the curve swings up and back down. Above: Spline terms as curves. One curve for each term. Each curse goes up in an almost sine curve, and the back down to zero. Each peak is seperated. There are now fewer splines, with one of them being highlighted with color, to show is simularity with the below chart."
Now we got some pretty good traction. Pulling out the well performing spline term, we can translate it a bit to show how well it overlaps with our signal.
data:image/s3,"s3://crabby-images/aef76/aef764ca2127fdbea447b308accadf4094f7ae5b" alt="Scatter chart. Predictor along the x-axis and outcome along the y-axis. Points scatter around a constant low value of the target. At regular intervals, the curve swings up and back down. Spline term has been overlaid as a curve, almost perfectly following the trend."
While one spline term is highlighted here, It is important to note that the coverage of the splines makes sure that any signal is captured.
There are obviously some signals that canβt be captured using splines. Compared to sine curves they are much more flexible, with a number of different kinds, each with some room for customization. Any purely periodic signal can be captured in the next chapter.
64.2 Pros and Cons
64.2.1 Pros
- More flexible than sine curves
- Fairly interpretable
64.2.2 Cons
- Requires that you know the period
- Will create some unnecessary features
- Canβt capture all types of signal
64.3 R Examples
We will be using the animalshelter data set for this.
library(recipes)
library(animalshelter)
|>
longbeach select(outcome_type, intake_date)
# A tibble: 29,787 Γ 2
outcome_type intake_date
<chr> <date>
1 euthanasia 2023-02-20
2 rescue 2023-10-03
3 euthanasia 2020-01-01
4 transfer 2020-02-02
5 rescue 2018-12-18
6 adoption 2024-10-18
7 euthanasia 2020-07-25
8 rescue 2019-06-12
9 rescue 2017-09-21
10 rescue 2024-12-15
# βΉ 29,777 more rows
There are two steps in the recipes package that support periodic splines. Those are step_spline_b()
and step_spline_nonnegative()
, used for B-splines and Non-negative splines (also called M-Splines) respectively.
These functions have 2 main arguments controlling the spline itself, and 2 main arguments controlling its periodic behavior.
deg_free
and degree
controls the spline, changing the number of spline terms that are created, and the degrees of the piecewise polynomials respectively. The defaults for these functions tend to be a good starting point. To make these steps periodic, we need to set periodic = TRUE
in options
. Lastly, we can control the period and its shift with Boundary.knots
in options
. I find the easiest way to set this like this: c(0, period) + shift
.
<- recipe(outcome_type ~ intake_date, data = longbeach) |>
spline_rec step_mutate(intake_date = as.integer(intake_date)) |>
step_spline_b(
intake_date,options = list(periodic = TRUE, Boundary.knots = c(0, 365) + 50)
)
|>
spline_rec prep() |>
bake(new_data = NULL)
# A tibble: 29,787 Γ 11
outcome_type intake_date_01 intake_date_02 intake_date_03 intake_date_04
<fct> <dbl> <dbl> <dbl> <dbl>
1 euthanasia 0 0 0 0
2 rescue 0 0 0 0
3 euthanasia 0 0 0 0
4 transfer 0 0 0 0
5 rescue 0 0 0 0
6 adoption 0 0 0 0
7 euthanasia 0 0.0154 0.437 0.512
8 rescue 0.176 0.682 0.142 0
9 rescue 0 0 0 0.0131
10 rescue 0 0 0 0
# βΉ 29,777 more rows
# βΉ 6 more variables: intake_date_05 <dbl>, intake_date_06 <dbl>,
# intake_date_07 <dbl>, intake_date_08 <dbl>, intake_date_09 <dbl>,
# intake_date_10 <dbl>
Find dataset where the predictor is naturally numeric.