Color is an interesting attribute that on the surface appears simple, but can be analyzed in many different ways. One could treat them like categorical features, ignoring the inherent structure and connections that come with colors.
Color names are strongly tied to language and culture, it is thus imperative that we know that when creating mappings between color names and a numerical representation.
TODO
Find good references for color and history.
We also have that there can be a lot of ambiguity in color names. You see this most prominently when buying paint, with each store or chain having its own, sometimes humourous, names for each shade they can produce. These names try to help customers distinguish between small differences in shades. Color names can also be quite broadly used. The color βgreenβ could in context mean an exact hue, and in another context refers to all the colors seen in a forest. The latter being akin to categorical collapse. All of this is to say that we think about our data, to allow us to better extract the signal if it is there.
Assuming we want to use precise mapping, then we can construct a table of color names and their corresponding precise representation. When working with computers, a commonly used way to present colors is using hex codes, which uses a six-digit hexadecimal number to represent a color. They are represented as #a8662b with the first 2 digits representing how red the color is, the second 2 digits representing how green it is, and the last 2 digits representing how blue it is. This gives us 16^6 = 16,777,216 unique colors, which isnβt enough to specify all possible colors but good enough for our use cases.
TODO
Find examples of color lists, maybe even create data-base.
# A tibble: 10 Γ 5
name hexcode red green blue
<chr> <chr> <int> <int> <int>
1 black #000000 0 0 0
2 brown #A52A2A 165 42 42
3 gray #BEBEBE 190 190 190
4 white #FFFFFF 255 255 255
5 orange #FFA500 255 165 0
6 tan #D2B48C 210 180 140
7 blue #0000FF 0 0 255
8 pink #FFC0CB 255 192 203
9 yellow #FFFF00 255 255 0
10 chocolate #D2691E 210 105 30
Through these hex codes, we already have some numeric representations that we can use for modeling. However, they may not be the most effective representation depending on what question we are trying to answer. This is where the idea of color spaces comes in. The one we have worked with is the RGB space, easy to use and understand but doesnβt translate well to notions that we typically care about like βHow dark is this colorβ. Another color space that might be able to solve these problems better would be the HSL color space. This is a color space that uses 3 values to describe its color, by its hue (think rainbow) that takes values between 0 and 360, saturation which you can define as its colorfulness relative to its own brightness on a scale from 0 to 100, and lightness which tells you how bright it is compared to pure white on a scale from 0 to 100.
Viewing these colors in this color space allows us to create different features. We can now with relatively easy say if a color is close to blue, by looking at whether its hue is sufficiently close to 240. This could be expanded to any color on the hue wheel. We can likewise ask straightforward questions about saturation and lightness.
TODO
This section would benefit from illustrations of the color spaces
Imagine you wanted a feature to say βHow close is this measured color to my reference colorβ, then you would need something called a perceptually uniform color space. These color spaces try to make Euclidian distances makes sense, examples include CIELAB and Oklab. The downside of these spaces is that each of the axes doesnβt contain any meaningful information.
how <-function(x, y) { x <- prismatic::color(x) |> farver::decode_colour() y <- prismatic::color(y) |> farver::decode_colour() farver::compare_colour(x, y, 'rgb', method ='cie2000')[, 1]}tibble(name = color_names) |>mutate(hexcode = stringr::str_sub(color(name), 1, 7)) |>mutate(red =how(hexcode, "red"),green =how(hexcode, "green"),blue =how(hexcode, "blue"),orange =how(hexcode, "orange") )
# A tibble: 10 Γ 6
name hexcode red green blue orange
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 black #000000 50.4 87.9 39.7 69.9
2 brown #A52A2A 18.9 86.3 41.7 45.9
3 gray #BEBEBE 36.8 33.2 54.1 28.9
4 white #FFFFFF 45.8 33.3 64.2 33.1
5 orange #FFA500 33.8 48.4 78.1 0
6 tan #D2B48C 34.4 36.6 66.1 17.1
7 blue #0000FF 52.9 83.2 0 78.1
8 pink #FFC0CB 34.9 63.8 56.8 35.8
9 yellow #FFFF00 64.3 23.4 103. 29.3
10 chocolate #D2691E 15.5 64.4 58.6 20.3
These are by no means all we can do with colors as predictors, but it might spark some helpful creativity.
102.2 Pros and Cons
102.2.1 Pros
Using color spaces to write creative features can provide a significant impact
102.2.2 Cons
-Creating the mappings between color words and their numerical representation can be challenging