Histograms in Python
Dylan | Sep 02, 2019
Histograms are a fantastic tool for quickly visualizing data distribution. Fortunately, Python’s matplotlib library makes it incredibly simple and straightforward to generate histograms!
In this post, we’ll be diving into the code for generating histograms in matplotlib but if you don’t know what a histogram is or would just like to refresh your memory, you can read my gentle post covering the basics of histograms first. Otherwise, let’s dive right in!
Let’s begin with the standard imports. Note: %matplotlib inline is specifically used with the Jupyter notebook to allow plots generated by matplotlib to appear in the same window as the editor, without having to generate a new pop-up window just to display the plot.
Generating Generic Histograms
Matplotlib’s function for generating histograms can be accessed through matplotlib.pyplot.hist(). The hist() function only requires the single parameter, x (our data). Our data can be stored in a plain Python list, numpy array, pandas series, or a pandas dataframe. For example, the following Series generates the proceeding histogram.
Note: it is also possible to pass multiple data (even of varying lengths) via x as a list of different datasets.
The number of bins can be controlled by passing the second parameter, bins, to the hist() function. If we set bins to an integer value, matplotlib will generate bin edges equal to bins +1, creating the number of bins specified. Below are some examples of different bin parameters on the distribution of weight across a population of male mice.
As you can see, as we increase the integer value of bins, the more precisely our data is represented by the histogram.
In addition to passing integer values, it’s also possible to pass an array of bin edges. For example, passing bins=[20, 24, 28, 32] will result in 4 bin edges, with the first bin edge at x-axis 20, the second bin edge at 24, the third at 28, and the fourth and final at 32. Below is the resulting histogram on the weight of male mice.
Note: When passing a sequence for bins, it’s not necessary for the values to be evenly spaced. We can specify and manually handle the width of each individual bin.
Like other matplotlib plot functions, the hist() function allows us to manually set the color of our histogram by passing a color parameter. In addition to color, we can define the label parameter to easily add a legend to our plot.
Although this post touched on the most important parameters associated with the matplotlib histogram function, there are still many other possible parameters that remain unmentioned. If you’re interested in learning everything possible about the matplotlib.pyplot.hist() function, take a look at its documentation
As always, thanks for reading! I sincerely hope everything has been presented clearly but if there’s anything left unclear please comment with your questions below! I look forward to interacting more with Python’s matplotlib library and generating histograms in the comments! Until next time, happy coding!