105  Spatial Overview

When we talk about spatial and geospatial feature engineering, we want to focus on the transformation and enrichment of the data set, based on spatial information. Typically this will be longitude and latitude-based, with the additional information being based on areas and regions such as cities or countries.

What sets these methods apart from some of the other methods we see in this book, is that they almost always require a reference data set to be able to perform the calculations. If you want to find the closest city to an observation, you need a data set of all the cities and their location. For all the methods in this section, the reader is expected to know how to gather this reference material for their problem.

We will split this data up into two types of methods, depending on your spatial information. point based methods and shape based methods.

In point-based methods, you know the location of your observation, and you calculate where it is in relationship to something else. You could look at distances by finding the distance to a fixed point, or multiple points as covered in Chapter 106. You could find the nearest of something as covered in Chapter 107. These two methods are different sides of the same coin. Another thing we could do is count the number of occurrences within a given distance or region. This is covered in Chapter 108. By knowing the location of something we are also able to query certain types of information such as β€œheight from sea”, β€œrainfall in inches in 2000” and so on. We cover these types of methods in Chapter 109. We can also expand some of these concepts and look at spatial embeddings, we will be covered in Chapter 110.

In shape-based methods, you don’t just have the positioning of your observation, but also its shape. This can be any shape; line, polygon, or circle. The methods seen in point-based methods can be applied to shape-based methods, we just need to be a little more careful when performing the calculations. Since we are given the shape of our observation, there are characteristics we can extract from those that might be useful. We look at how we can incorporate that information in Chapter 111.