Add Matplotlib Percentage Ticks to a Histogram

Matplotlib provides an easy way of converting your yaxis to percentages. It’s just a one liner

import matplotlib.ticker as ticker
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax))

But the issue is you can’t space the yticks as you want them to be. Usually you can do this by setting yticks (ax.set_yticks). But the issue is, python converts the axis directly to percentages, only after setting the yticks. This means if you want to have ticks like (1%, 2%,…..(N-1)%, N%), you have to set the range and range increment such that after Matplotlib does the percentage conversion, it would look the way we want.

Essentially we have to trick Matplotlib.

Let’s start with a simple histogram.

num_of_points = 10000
num_of_bins = 20
data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution
fig, ax = plt.subplots()
ax.hist(data, bins=num_of_bins, edgecolor='black')
ax.set_title("Histogram")
ax.set_xlabel("X axis")
ax.set_ylabel("Percentage")
plt.show()

Next do the percentage formatting with the one liner.

num_of_points = 10000
num_of_bins = 20 
data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution
fig, ax = plt.subplots()
ax.hist(data, bins=num_of_bins, edgecolor='black')
ax.set_title("Histogram")
ax.set_xlabel("X axis")
ax.set_ylabel("Percentage")
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data)))
plt.show()

Now say we need to have percentage ticks at 1% granularity on the yaxis and that you need to figure out the maximum bar height. Luckily, the hist function returns the y values and the edges of the bins. Using the y values, we can calculate the maximum percentage that we would see

(max(y_vals) / len(data))

Add one percentage point (0.01) so that the graph would not touch the top line.

(max(y_vals) / len(data)) + 0.01

Round to two decimal points.

round((max(y_vals) / len(data)) + 0.01, 2)
y_vals, x_vals, e_ = ax.hist(data, bins=num_of_bins, edgecolor='black')y_max = round((max(y_vals) / len(data)) + 0.01, 2)

Now we can reverse calculate to find out the absolute y_max value since we know the percentage.

y_abs_max = y_max * len(data)

We need ticks at 1% granularity and 100% is equivalent to  len(data). So the tick interval in absolute terms should be 1% * len(data

tick_interval = 0.01 * len(data)

Set the y_lim so that we would see just the part we need to see.

ax.set_ylim(ax.get_yticks()[0], ax.get_yticks()[-1])

The whole code would look like as follows.

num_of_points = 10000
num_of_bins = 20
data = np.random.randn(num_of_points)           # generate random numbers from a gaussian distribution
fig, ax = plt.subplots()
y_vals, x_vals, e_ = ax.hist(data, bins=num_of_bins, edgecolor='black')
ax.set_title("Histogram")
ax.set_xlabel("X axis")
ax.set_ylabel("Percentage")
y_max = round((max(y_vals) / len(data)) + 0.01, 2)
ax.set_yticks(ticks=np.arange(0.0, y_max * len(data), 0.01 * len(data)))
ax.set_ylim(ax.get_yticks()[0], ax.get_yticks()[-1])
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data)))
plt.show()

Leave a thumbs up and subscribe  if this blog post saved your valuable time!

Leave a Reply

Your email address will not be published. Required fields are marked *