tutorials & docs,tools & experiences for developers

20 Minutes Tutorial for Matplotlib

Matplotlib is a 2D drawing library in Python. It supports various platforms and is powerful enough to draw out various professional graphics easily. This article is an introductory tutorial on it.

1. Operation Environment

Since it's a Python package, you need to have the environment for Python installed on your machine first. You can search on the network yourself about this.

See here for how to install Matplotlib: Matplotlib Installing.

I recommend to install by pip, the method is as follows:

sudo pip3 install matplotlib

The source code and test data in this article can be found here: matplotlib_tutorial

There is another Python library used in the code examples of the article: NumPy. I have also written a basic tutorial on NumPy, see here: NumPy Tutorial: Python Machine Learning Library.

The code is tested in the following environment:

  • Apple OS X 10.13
  • Python 3.6.3
  • matplotlib 2.1.1
  • numpy 1.13.3

2. Introduction

Matplotlib is suitable for a variety of environments, including:

  • Python script
  • IPython shell
  • Jupyter notebook
  • Web application server
  • User graphical interface toolkit

You can easily generate various types of graphics with the help of Matplotlib, such as histograms, spectrograms, bar graphs, scatter plots, and so on. You can also customize a graphic very easily.

3. Getting started code example

Let's take a look at one of the simplest code examples:

# test.py

import matplotlib.pyplot as plt
import numpy as np

data = np.arange(100, 201)
plt.plot(data)
plt.show()

There are only three lines for the main logic of the code, but it draws out a very intuitive linear graph, as shown below:

Now let's explain the logic of the example code:

  1. Generate an array of integers between [100, 200] by np.arange(100, 201) whose value is: [100, 101, 102, ..., 200]
  2. Draw it out through matplotlib.pyplot. It's obvious that the values ​​drawn correspond to the ordinate (y-axis) in the figure. And matplotlib itself sets the abscissa (x-axis) of the graph for us: [0, 100], because we have exactly 100 values.
  3. Display the graphic via plt.show().

The code is very simple. If you already have the running environment installed, save the above code to a text file (or get the source code via Github), and then you can see the above graphic on your own computer with the following command:

python3 test.py

Note 1: In the following tutorials, we will explain how to customize an image in the diagram. For example: axes, graphics, shading, line styles, and more.

Note 2: If not necessary, the border outside the graphic will be removed in the screenshot below, leaving only the graphic body.

4. Draw multiple graphics at once

There are times when we might want to draw multiple graphics at once, for example, when you need to compare two sets of data, or to display a set of data in a different way.

You can create multiple graphics by the following ways:

4.1 Multiple figures

A figure can be simply understood as a graphics window. matplotlib.pyplot will have a default figure, and we can also create more by plt.figure(), as shown in the following code:

# figure.py

import matplotlib.pyplot as plt
import numpy as np

data = np.arange(100, 201)
plt.plot(data)

data2 = np.arange(200, 301)
plt.figure()
plt.plot(data2)

plt.show()

It draws two windows, each of which is a line graph with a different interval, as follows:

Note: The two windows are completely coincident at the initial state.

4.2 Multiple subplots

In some cases, we want to display multiple graphics in the same window. At this point, you can use multiple subplots. Here's a code example:

# subplot.py

import matplotlib.pyplot as plt
import numpy as np

data = np.arange(100, 201)
plt.subplot(2, 1, 1)
plt.plot(data)

data2 = np.arange(200, 301)
plt.subplot(2, 1, 2)
plt.plot(data2)

plt.show()

In the above code, the first two parameters of the subplot function specify the number of subplots. Thus the current graph will be divided in the form of matrix, and the two parameters, which are integers, specify the number of rows and columns of the matrix respectively. The third parameter refers to the index in the matrix.

Therefore, the following code refers to the first subplot in the 2-row and 1-column subplots.

plt.subplot(2, 1, 1)

The following code refers to the second subplot in the 2-row and 1-column subplots.

plt.subplot(2, 1, 2)

So the result of the code is like this:

The parameters of the subplot function not only support the above form, but also can combine three integers (within 10) into one integer. For example: 2, 1, 1 can be written as 211, and 2, 1, 2 can be written as 212.

Therefore, the result of the following code is the same as above:

import matplotlib.pyplot as plt
import numpy as np

data = np.arange(100, 201)
plt.subplot(211)
plt.plot(data)

data2 = np.arange(200, 301)
plt.subplot(212)
plt.plot(data2)

plt.show()

For more details about the subplot function, please see here: matplotlib.pyplot.subplot

5. Common graphic examples

Matplotlib can be used to generate a lot of graphic styles. Come here to have a look:  Matplotlib Gallery .

As a starter tutorial, let's take a look at some of the most commonly used graphics first.

5.1 Linear graph

In the previous example, the points on the horizontal axis of the linear graph are generated automatically, but there are chances that we need to set them. In addition, we may also want to customize the lines. Take a look at the following example:

# plot.py

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [3, 6, 9], '-r')
plt.plot([1, 2, 3], [2, 4, 9], ':g')

plt.show()

We will get such a graphic:

From the code, we can know:

  1. The first array of the plot function is the value of the horizontal axis, and the second array is the value of the vertical axis, so one of them is a straight line and the other is a polyline;
  2. The last parameter is made up of two characters: the style and color of the line. The former is a red line and the latter is a green dotted line. For the description of styles and colors, see the API for the plot function: matplotlib.pyplot.plot

5.2 Scatter plot

The scatter function is used to draw a scatter plot. Again, the function also requires two sets of paired data to specify the coordinates of the x and y axes. Here's a code example:

# scatter.py

import matplotlib.pyplot as plt
import numpy as np

N = 20

plt.scatter(np.random.rand(N) * 100,
            np.random.rand(N) * 100,
            c='r', s=100, alpha=0.5)

plt.scatter(np.random.rand(N) * 100,
            np.random.rand(N) * 100,
            c='g', s=200, alpha=0.5)

plt.scatter(np.random.rand(N) * 100,
            np.random.rand(N) * 100,
            c='b', s=300, alpha=0.5)

plt.show()

From the code, we can know:

  1. The graphic contains three sets of data, each of which contains 20 random coordinates.
  2. The parameter c represents the color of the point, s represents the size of the point, and alpha represents the transparency.

The graphic will be drawn as follows:

For more details about the scatter function, please see here: matplotlib.pyplot.scatter

5.3 Pie chart

The pie function is used to draw a pie chart. Pie charts are often used to express the percentages for each part of the collection.

# pie.py

import matplotlib.pyplot as plt
import numpy as np

labels = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

data = np.random.rand(7) * 100

plt.pie(data, labels=labels, autopct='%1.1f%%')
plt.axis('equal')
plt.legend()

plt.show()

From the code, we can know:

  1. The data is a set of random values ​​containing 7 datas
  2. The labels in the graphic are specified by labels
  3. The precision format of the value is specified by autopct
  4. plt.axis('equal') is used to set the axis size to be consistent
  5. plt.legend() indicates that a legend is to be drawn (see the upper right corner of the graphic below)

The graphic will be as follows:

For more details about the pie function, please see here: matplotlib.pyplot.pie

5.4 Bar chart

The bar function is used to draw a bar chart. Bar charts are often used to compare a set of data, such as seven days of a week, daily city traffic.

Here's an example:

# bar.py

import matplotlib.pyplot as plt
import numpy as np

N = 7

x = np.arange(N)
data = np.random.randint(low=0, high=100, size=N)
colors = np.random.rand(N * 3).reshape(N, -1)
labels = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

plt.title("Weekday Data")
plt.bar(x, data, alpha=0.8, color=colors, tick_label=labels)
plt.show()

From the code, we can know:

  1. The graphic shows a set of results with 7 random values, each of which is a random number in [0, 100]
  2. Their colors are also generated by random numbers. np.random.rand(N * 3).reshape(N, -1) means that it will generate 21 (N x 3) random numbers first, and then assemble them into 7 lines, so each line has three numbers, which corresponds to the three components of color. If you don't understand this line of code, please take a look at the Python Machine Learning Library NumPy tutorial.
  3. The title specifies the title of the graphic, labels specifies the label, and alpha specifies the transparency.

The graphic will be as follows:

For more details about the bar function, please see here: matplotlib.pyplot.bar

5.5 Histogram

The hist function is used to draw a histogram. The histogram looks a bit like a bar chart. But their meanings are different. The histogram describes how often data appears within a certain range. Maybe it's a little abstract, so let's understand it through a code example:

# hist.py

import matplotlib.pyplot as plt
import numpy as np

data = [np.random.randint(0, n, n) for n in [3000, 4000, 5000]]
labels = ['3K', '4K', '5K']
bins = [0, 100, 500, 1000, 2000, 3000, 4000, 5000]

plt.hist(data, bins=bins, label=labels)
plt.legend()

plt.show()

In the above code, [np.random.randint(0, n, n) for n in [3000, 4000, 5000]] generates an array containing three arrays, where:

  1. the first array contains 3000 random numbers, and the range of these random numbers is [0, 3000)
  2. the second array contains 4000 random numbers, and the range of these random numbers is [0, 4000)
  3. the third array contains 5000 random numbers, and the range of these random numbers is [0, 5000)

The bins array is used to specify the bounds of the histogram, which means that there will be one data point in [0, 100) , one data point in [100, 500) , and so on. So the final result will show a total of 7 data points. And it also has been specified labels and legends in the code.

The output of the code is shown below:

From the graphic, we can see that the three sets of data all have datas below 3000, and the frequency is similar. However, the datas of the blue bars are all below 3000, and the datas of the orange bars are all below 4000. This is exactly the same as our random array datas.

For more details about the hist function, please see here: matplotlib.pyplot.hist

6. Conclusion

Now we've known the usage of Matplotlib and how to draw some of the most basic graphics.

It should be noted that since this is an introductory tutorial, we only give the most basic use of these functions and graphics in this article. But in fact, the features are much more than that. So you can go to the API URLs of these functions which are given in the article for further exploration.

7. Reference and recommended materials

0 Comment

temp