In previous posts we learned how to build a basic histogram using the default plot settings of R, and then we learned how to add colors to the bars of our histogram. Let’s quickly review these steps so you don’t have to read theses posts by now: in order to build a simple histogram all we have to do is to call the hist() function passing our data as argument. If we want to add some color to the histogram color we must pass a color name (or RGB hexadecimal code) as argument to the col parameter:

 hist(iris$Sepal.Length, col = '#BBDEFB') 

labels_1By observing the produced histogram we conclude that about 30 instances have the Sepal Length attribute between 6 and 6.5. We also conclude that the ranges [4,4.5), [7,7.5) and [7.5,8) have about 5 instances of data each. However these conclusions are approximations, and we would like to know the exact number of instances for each range without the help of an auxiliar table. The solution to this problem is simply add labels to each bar of the histogram. This is easily obtained by passing a TRUE value as argument to the labels parameter of the hist() function:


hist(
  iris$Sepal.Length,
  col = '#BBDEFB',
  labels = TRUE,
  ylim=c(0,35)
)

labels_2Now on top of each bar there is a label indicating the exact number of instances of each range of the attribute Sepal Length. We observe that in fact, we have 31 instances of data between 6 and 6.5  , while the ranges [4,4.5), [7,7.5) and [7.5,8) have 5, 6 and 6 instances respectively.

Besides the bars labels we can also set custom labels for the histogram axis. Observe that the horizontal axis is labeled with the name of the variable that contains the data. This is not a problem when the variable name exactly matches its content, but this is not true most of time. For example, we could assigned the Sepal Length values to a variable named ‘x’, and the result would be like this:

x <- iris$Sepal.Length
hist(x, col = '#BBDEFB', labels = TRUE, ylim=c(0,35))

labels_3Observe that by changing the name of the data variable we altered both the horizontal axis label as well as the title of the plot, and that is the kind of behaviour that we want to avoid. So let’s define a custom name for the horizontal axis so it does not change with the data variable name. This is done by setting a character value for the xlab parameter when calling the hist() function:

x <- iris$Sepal.Length
hist(
  x,
  col = '#BBDEFB',
  labels = TRUE,
  xlab = 'Sepal Length',
  ylim=c(0,35)
)

labels_5Similarly we can also define a custom label for the vertical axis using the ylab parameter:

x <- iris$Sepal.Length
hist(
  x,
  col = '#BBDEFB',
  labels = TRUE,
  xlab = 'Sepal Length',
  ylab = 'Number of Plants',
  ylim=c(0,35)
)

labels_6

Finally, we can define custom title and subtitle for our histogram using the paramters title and sub:

x <- iris$Sepal.Length
hist(
  x,
  col = '#BBDEFB',
  labels = TRUE,
  xlab = 'Sepal Length',
  ylab = 'Number of Plants',
  main = 'Sepal Length Distribution',
  sub = 'Histogram of Sepal Length',
  ylim=c(0,35)
)

labels_7

And we’re done! In this post we learned how can we customize the labels texts of our histogram. This is a very useful feature because the audience of our visualization may not be familiar with the dataset attribute names or techinical terms of statistics. Therefore, by applying custom labels and titles we can improve our communication by making the meaning of data clearer.

Advertisements