The histogram is one of the simplest and most useful data visualization tools. Given a continuous variable, it shows how its values are distributed along their range. By visualizing the histogram, we can identify in a glimpse if a variable is uniformly distributed or if a specific range of values concentrate most of the occurrences.
In this post, I am going to show how to build a histogram using the R basic graph library. As example, we are going to use the Iris dataset, composed by 4 numeric attributes and a category attribute. We’re going to pick up one of these attributes and build a histogram for it. So, let’s take quick look at some lines of our data:
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa
Then we choose one of its attributes to show in the histogram. Let’s start with the Sepal.Length attribute:
And that’s it! All we need to do is to call the function hist() passing your
data as argument and your histogram is ready. The hist() function plots a
histogram using default values for graphs and automatically calculates the
ranges of X and Y axes, as well as the width, height and number of the bars.
Although the default behavior of the hist() function provides us with a good
visualization of our data, we can easily customize this behavior to better fit
our needs. In future posts, we are going to explore some of the many
possibilities of customization of the histogram.