Matplotlib provides an easy way of converting your yaxis to percentages. It’s just a one liner

import matplotlib.ticker as ticker ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax))

But the issue is you can’t space the yticks as you want them to be. Usually you can do this by setting yticks (`ax.set_yticks`

). But the issue is, python converts the axis directly to percentages, only after setting the yticks. This means if you want to have ticks like (1%, 2%,…..(N-1)%, N%), you have to set the range and range increment such that after Matplotlib does the percentage conversion, it would look the way we want.

Essentially we have to trick Matplotlib.

Let’s start with a simple histogram.

num_of_points = 10000 num_of_bins = 20 data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution fig, ax = plt.subplots() ax.hist(data, bins=num_of_bins, edgecolor='black') ax.set_title("Histogram") ax.set_xlabel("X axis") ax.set_ylabel("Percentage") plt.show()

Next do the percentage formatting with the one liner.

num_of_points = 10000 num_of_bins = 20 data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution fig, ax = plt.subplots() ax.hist(data, bins=num_of_bins, edgecolor='black') ax.set_title("Histogram") ax.set_xlabel("X axis") ax.set_ylabel("Percentage") ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data))) plt.show()

Now say we need to have percentage ticks at 1% granularity on the yaxis and that you need to figure out the maximum bar height. Luckily, the `hist`

function returns the y values and the edges of the bins. Using the y values, we can calculate the maximum percentage that we would see

(max(y_vals) / len(data) )

Add one percentage point (0.01) so that the graph would not touch the top line.

(max(y_vals) / len(data)) + 0.01

Round to two decimal points.

round((max(y_vals) / len(data)) + 0.01, 2)

y_vals, x_vals, e_ = ax.hist(data, bins=num_of_bins, edgecolor='black') y_max = round((max(y_vals) / len(data)) + 0.01, 2)

Now we can reverse calculate to find out the absolute y_max value since we know the percentage.

y_abs_max = y_max * len(data)

We need ticks at 1% granularity and `100% is equivalent to len(data)`

. So the tick interval in absolute terms should be `1% * len(data`

tick_interval = 0.01 * len(data)

Set the y_lim so that we would see just the part we need to see.

ax.set_ylim(ax.get_yticks()[0], ax.get_yticks()[-1])

The whole code would look like as follows.

num_of_points = 10000 num_of_bins = 20 data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution fig, ax = plt.subplots() y_vals, x_vals, e_ = ax.hist(data, bins=num_of_bins, edgecolor='black') ax.set_title("Histogram") ax.set_xlabel("X axis") ax.set_ylabel("Percentage") y_max = round((max(y_vals) / len(data)) + 0.01, 2) ax.set_yticks(ticks=np.arange(0.0, y_max * len(data), 0.01 * len(data))) ax.set_ylim(ax.get_yticks()[0], ax.get_yticks()[-1]) ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data))) plt.show()

Leave a thumbs up and subscribe if this blog post saved your valuable time!