In the post How to build a histogram in R we learned that based on our data, the **hist()** function automatically calculates the size of each bin of the histogram. However we may find that the default number of bins does not offer sufficient details of our distribution. Or we may want to summarize the details of the distribution by grouping one or more range values. So in order to perform this arrangement we can change the number of the bins to better satisfy our needs.

So let’s start with a simple histogram of the Petal.Length attribute of the Iris dataset:

hist(iris$Petal.Length, col = 'skyblue3')

Observe that we have a very unbalanced distribution, including an empty bin in range [2,2.5]. One possible approach to improve this visualization is to group these intervals by reducing the number of bins in the histogram. This can be done using the **breaks** parameter of the **hist()** function:

hist(
iris$Petal.Length,
col = 'skyblue3',
breaks = 6
)

When we specify the number of bins using the **breaks** parameter, the new size of each bin is automatically calculated by the **hist()** to a **pretty** value. In other words, when we specify the **breaks** parameter as a single integer value the resulting size of each bin must be 1,2 or 5 times a multiple of 10. On the contrary, the **hist()** function will compute the number of bins as close as possible of the specified value of **breaks**.

Another possibility is to pass a number vector to the **breaks** parameter, so that we can set an arbitrary number of bins of arbitrary sizes:

hist(
iris$Petal.Length,
col = 'skyblue3',
breaks = c(1,1.3,2,4,5,7)
)

In this case each bin assumed different sizes accordingly to the vector passed as argument to the **breaks** parameter. However, this option forces the histogram to use density probabilities in order to keep the proportion of the bars areas.

These were the 2 simplest ways of defining the bin sizes or breaks of a histogram. In future posts we are going to explore the another 3 options supported by the **breaks** parameter to define the bins sizes:

- a function that returns the number of bins
- a function that returns a vector of breaks
- a string indicating the algorithm used to calculate the number of bins. The options are: “Sturges” (default), “Scott” and “FD”.

### Like this:

Like Loading...