112 Time-series Overview
112.1 Time-series Overview
A very common type of data is time-series data. This is data where the observations are taken across time with time stamps. This data will inherently be correlated with itself as the same measurement for the same unit likely isnβt going to change too much. Time series data is typically modeled using different methods than other predictive modeling for this reason.
Nevertheless, there are still some methods that will be useful for us. It is important to note that there is a time component to this data, which is what we are trying to predict. The observations can happen at regular intervals, once a day, or irregular intervals, each time the engine errors. Different types of data use different types of models and make the feature engineering a little different depending on what you are trying to do.
This chapter assumes the user knows how to use time series data and the precautions that are needed when working with data of this type.
add a link to appropriate resources
One area of work is modifying the sequence itself. Some of these transformations can be taken from the Numeric section. However, some metrics are specific to time series data. Likewise, some metrics are similar to what we do in Missing section and Outliers section, but special care needs to be taken with time series data.
- Smoothing
- Sliding window transformations
- Log Interval Transformation
- Time series-specific handling of missing values
- Time series-specific handling of outliers
Another area of methods is where we extract or modify the sequence or its data to create new variables, this is done by considering the time component. This will result in breaking about the time component in a decomposition kind of way, or by getting information out of the values themselves. One such example of the latter is found in the datetime section. Particularly holiday extraction.