125 Image Overview

125.1 Image Overview

Image data is quite dissimilar to anything we have seen so far in this book. And will need a whole different set of techniques to deal with them. For Most of this book, the unit of data was 1 value, such as 4, 7, or 0.3562 for numeric, and “cat” and “dog” for categorical. Text data as seen in the Text section, is being treated as a series of tokens.

Image data on the other hand is represented as an array of numbers, typically integers with varying dimensions. Each image is comprised of pixels, these pixels are laid out in a rectangular grid. For each pixel value, you typically have between 1 and 4 values. These are called channels. 1 channel is used for gray-scale images. 3 channels are used for color images as they have a red channel, a green channel and a blue channel. Lastly, sometimes there is also a fourth channel for opacity.

TODO

Add diagrams

This means that for a 500-pixel by 1000-pixel color image, we have 500 * 1000 * 3 = 1500000 values. This is quite a lot of data, and hopefully, we will be able to squeeze some out of it.

The preprocessing techniques for images can be split into a couple of categories. All of which will be covered.

125.2 Feature Extraction

In the extraction setting, we take the images and try to extract smaller, hopefully smaller vectors of information. These could be simple statistics or larger and more complicated methods. One does not need to do this right away, and sometimes it is beneficial to apply some of the image modification methods below before doing the extraction.

125.3 Image Modification

Sometimes the images you get will not be in the best shape for your task at hand. This could be for various reasons. Applying color changes of different kinds can help highlight the important parts of the image, such that later preprocessing steps or models have an easier time picking up on it. Likewise, you might need to scale the data to help the model and well as reduce noise. Lastly, you will most likely need to resize your images as many deep-learning image modes work on fixed input sizes.

125.4 Augmentation

A common trick when working with image data is to do augmentation. What we mean by that, is that we do different kinds of transformations to generate new images that contain the same information but in different ways. It creates a larger data set. With the hopes of increasing the performance and generalization. Being able to detect cat pictures regardless if they are centered in the image or not.

125.5 Embeddings

We can also take advantage of transfer learning. People have fit image deep learning models on many images before us. And some of these trained models can be reused for us. We will look at that in Chapter 136.