In this post we are going to learn how to build a boxplot using R. In the boxplot we visualize the distribution of a variable over data points called quartiles besides the presence of outliers. Once more, we are going to use the Iris dataset as input data.

So let’s build a boxplot for the Sepal.Width attribute of the Iris dataset:

boxplot(iris$Sepal.Width)

boxplot1

Observe that all we had to do was to call the boxplot() function passing an array of numeric values as argument. We can improve the boxplot presentation using labels and color parameters:

boxplot(
  iris$Sepal.Width,
  main = 'Sepal Width',
  col = 'lightblue3'
)

 

boxplot2

Finally, an interesting analysis is to visualize how a categorical variable influences the distribution of the attribute variable. For example, in the Iris dataset we can analyse how the distribution of the Sepal.Width variable is affected along the values of the Species categories. In order to perform this analysis we must draw a different boxplot for each subset of categories, by calling the boxplot() function and passing a formula as argument:

boxplot(
iris$Sepal.Length ~ iris$Species,
col = 'lightblue3',
main = 'Sepal Length x Species'
)

boxplot3