40  Advanced Features

All the features we were able to extract were related to what day or time it was for a given observation. Or numbers on the form β€œhow many since the start of the month” or β€œhow many days since the start of the week”. And while this information can be useful, there will often be times when we want to do slight modifications that can result in huge payoffs.

Consider merchandise sale-related data. The mere indication of specific dates might become useful, but the sale amount is not likely to be affected just on the sale days, but on the surrounding days as well. Consider the American Black Friday. This day is predetermined to come every year at an easily recognized day, namely the last Friday of November. Considering its close time to Christmas and other gift-giving holidays, it is a common day for thrifty people to start buying presents.

In the extraction since we have a single indicator for the day of Black Friday

Bar chart. Dates along the x-axis, numeric effect along the y-axis. A Single bar on Nov 24 with a value of 1 is shown.
Figure 40.1: We only see the effect of a single Day

But it would make sense that since we know the day of Black Friday, that the sales will see a drop on the previous days, we can incorporate that as well.

Bar chart. Dates along the x-axis, numeric effect along the y-axis. A single bar on Nov 24 with a value of 1 is shown, the columns before the 24ths takes negative values, with the 23 having the highest value, 22 less and so on.
Figure 40.2: Negative before effects can capture hesitancy to buy before a big sale.

On the other hand, once the sale has started happening the sales to pick up again. Since this is the last big sale before the Holidays, shoppers are free to buy their remaining presents as they don’t have to fear the item going on sale.

Bar chart. Dates along the x-axis, numeric effect along the y-axis. A single bar on Nov 24 with a value of 1 is shown, the columns before the 24ths takes negative values, with the 23 having the highest value, 22 less and so on. The values after Nov 24 have decreasing values.
Figure 40.3: Positive affects effects can capture the ease of mind that no other sale will come.

The exact effects shown here are just approximate to our story at hand. But they provide a useful illustration. There is a lot of bandwidth to be given if we look at date times from a distance perspective. We can play around with β€œdistance from” and β€œdistance to”, different numerical transformations we saw in Chapter 1, and signs and indicators we talked about in Chapter 39 to tailor our feature engineering to our problem.

What all these methods have in common is a reference point. For an extracted day feature, the reference point is β€œfirst of the month” and the after-function is x, or in other words β€œdays since the time of day”. We see this in the following chart. Almost all extracted functions follow this formula

Bar chart. Dates along the x-axis, numeric effect along the y-axis. Values start at 1 and the first of the month, and increase by 1 for each day. A Triangle pattern appears.
Figure 40.4: Repeated increasing values.
Bar chart. Dates along the x-axis, numeric effect along the y-axis. Values start at 1 and the first of the month, and increase by 1 for each day. A Triangle pattern appears.
Figure 40.5: Repeated increasing values.

we could just as well do the inverse and look at how many days are left in the month. This would have a before-function of x as well.

Bar chart. Dates along the x-axis, numeric effect along the y-axis. Values start at 1 and the last of the month, and increase by 1 for each day going backwards. A Triangle pattern appears. The starting value is different for each month as each month has a different number of days.
Figure 40.6: Repeated increasing values.

We can do a both-sided formula by looking at β€œhow many days are we away from a weekend”. This would have both the before and after functions be x and look like so. Here it isn’t too interesting as it is quite periodic, but using the same measure with β€œsale” instead of β€œweekend” and suddenly you have something different.

Bar chart. Dates along the x-axis, numeric effect along the y-axis. Values are zero for both Saturdays and Sundays. 1 for Mondays and Fridays, 2 for Tuesdays and Thursdays, and 3 for Wednesdays.
Figure 40.7: Repeated

There are many other functions you can use, they will depend entirely on your task at hand. A few examples are shown below for inspiration.

Faceted bar chart. Dates along the x-axis, numeric effect along the y-axis. Each of the charts represents the day of the month for a couple of months. One shows the logarithmic transformation, one shows the untransformed data one looks at the square transformation, and one looks at the untransformed data that has been rounded down to 10, creating a plateau.
Figure 40.8: Repeated

What makes these calculations so neat is that they can be tailored to our task at hand and that they work with irregular events such as holidays and signup dates. These methods are not circular by definition, but they will work in many ways it. We will cover explicit circular methods in Chapter 41.

40.2 Pros and Cons

40.2.1 Pros

40.2.2 Cons

40.3 R Examples

40.4 Python Examples