Introduction to Machine Learning - TensorFlow | Original
This post was originally written in Chinese. It has been translated to English to facilitate further translations into other languages.
Since we are learning Python
, we definitely need to talk about machine learning. Many of its libraries are written in Python. Let’s start by installing them and trying them out.
TensorFlow
Let’s install it.
$ pip install tensorflow
ERROR: Could not find a version that satisfies the requirement tensorflow
ERROR: No matching distribution found for tensorflow
$ type python
python is aliased to `/usr/local/Cellar/python@3.9/3.9.1_6/bin/python3'
However, TensorFlow 2
only supports Python 3.5–3.8
. We are using 3.9
.
% type python3
python3 is /usr/bin/python3
% python3 -V
Python 3.8.2
Notice that the python3
on my system is version 3.8.2
. Where does the corresponding pip
for this Python version install packages?
% python3 -m pip -V
pip 21.0.1 from /Users/lzw/Library/Python/3.8/lib/python/site-packages/pip (python 3.8)
The corresponding pip
is here. So, I’ll modify the .zprofile
file. Recently, I changed my shell
, and .zprofile
is equivalent to the previous .bash_profile
. Add a line:
alias pip3=/Users/lzw/Library/Python/3.8/bin/pip3
This way, we can use python3
and pip3
to work with TensorFlow
.
% pip3 install tensorflow
...
Successfully installed absl-py-0.12.0 astunparse-1.6.3 cachetools-4.2.1 certifi-2020.12.5 chardet-4.0.0 flatbuffers-1.12 gast-0.3.3 google-auth-1.27.1 google-auth-oauthlib-0.4.3 google-pasta-0.2.0 grpcio-1.32.0 h5py-2.10.0 idna-2.10 keras-preprocessing-1.1.2 markdown-3.3.4 numpy-1.19.5 oauthlib-3.1.0 opt-einsum-3.3.0 protobuf-3.15.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-2.25.1 requests-oauthlib-1.3.0 rsa-4.7.2 tensorboard-2.4.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.4.1 tensorflow-estimator-2.4.0 termcolor-1.1.0 typing-extensions-3.7.4.3 urllib3-1.26.3 werkzeug-1.0.1 wheel-0.36.2 wrapt-1.12.1
Many libraries were installed. Let’s use an example from the official website.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
print(predictions)
Let’s run it.
$ /usr/bin/python3 tf.py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 10s 1us/step
[[ 0.15477428 -0.3877643 0.0994779 0.07474922 -0.26219758 -0.03550266
0.32226565 -0.37141111 0.10925996 -0.0115255 ]]
As you can see, the dataset was downloaded, and the results were output.
Next, let’s look at an image classification example.
# TensorFlow and tf.keras
import tensorflow as tf
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
print(tf.__version__)
An error occurred.
ModuleNotFoundError: No module named 'matplotlib'
Let’s install it.
% pip3 install matplotlib
Now it’s correct.
$ /usr/bin/python3 image.py
2.4.1
Let’s copy and paste the example code.
# TensorFlow and tf.keras
import tensorflow as tf
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
print(train_images.shape)
print(len(train_labels))
The results are output. Notice that we have train_images
, train_labels
, test_images
, and test_labels
. These are split into training and testing datasets.
(60000, 28, 28)
60000
Next, let’s try to display an image.
print(train_images[0])
Let’s see the result.
[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 13 73 0
0 1 4 0 0 0 0 1 1 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 3 0 36 136 127 62
54 0 0 0 1 3 4 0 0 3]
[ 0 0 0 0 0 0 0 0 0 0 0 0 6 0 102 204 176 134
144 123 23 0 0 0 0 12 10 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 236 207 178
107 156 161 109 64 23 77 130 72 15]
[ 0 0 0 0 0 0 0 0 0 0 0 1 0 69 207 223 218 216
216 163 127 121 122 146 141 88 172 66]]
....
Here, part of the result is excerpted.
print(len(train_images[0][0]))
Outputs 28
. So, it’s clear that this is a matrix with a width of 28. Let’s continue printing.
print(len(train_images[0][0][0]))
TypeError: object of type 'numpy.uint8' has no len()
So, it’s clear. Each image is a 28*28*3
array. The last dimension stores the RGB values. However, we might be wrong about this.
print(train_images[0][1][20])
0
print(train_images[0][1])
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
This shows that each image is a 28*28
array. After some tinkering, we finally figured out the secret.
Let’s first look at the output image.
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()
Do you see the color bar on the right? From 0
to 250
. It turns out this is a gradient between two colors. But how does it know which two colors? Where did we specify that?
Next, let’s print the second image as well.
plt.imshow(train_images[1])
Very interesting. Is this the default of the pyplot
library? Let’s continue running the code provided by the official website.
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[i]])
plt.show()
Notice that the images and their classifications are displayed here. Finally, we understand the cmap
parameter. If cmap
is not specified, it will default to the color scheme we saw earlier. Indeed.
plt.imshow(train_images[i])
This time, we search for pyplot cmap
and find some resources.
plt.imshow(train_images[i], cmap=plt.cm.PiYG)
Let’s modify the code.
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(2,5,i+1) ## Modified this line
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.Blues)
plt.xlabel(class_names[train_labels[i]])
plt.show()
However, an error occurred.
ValueError: num must be 1 <= num <= 10, not 11
What does this mean? What exactly does the previous 5,5,i+1
mean? Why doesn’t it work when changed to 2
? Although we intuitively understand it might mean 5 rows and 5 columns, why does this error occur? How is 11
calculated? What does num
mean? What does 10
mean? Notice that 2*5=10
. So, perhaps the error occurs when i=11
. When changed to for i in range(10):
, we get the following result.
This time, after briefly looking at the documentation, we learn about subplot(nrows, ncols, index, **kwargs)
. Okay, now we understand.
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
# plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.Blues)
plt.xlabel(class_names[train_labels[i]])
plt.show()
Notice that things like 0 25
are called xticks
. When we zoom in or out of this frame, the display changes.
Notice that when zooming in or out, xticks
and xlabels
display differently.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
Notice the way the model is defined here, using the Sequential
class. Pay attention to these parameters: 28,28
, 128
, relu
, 10
. Notice that you need to compile
and fit
. fit
means fitting. Notice that 28,28
corresponds to the image size.
Epoch 1/10
1875/1875 [==============================] - 2s 928us/step - loss: 0.6331 - accuracy: 0.7769
Epoch 2/10
1875/1875 [==============================] - 2s 961us/step - loss: 0.3860 - accuracy: 0.8615
Epoch 3/10
1875/1875 [==============================] - 2s 930us/step - loss: 0.3395 - accuracy: 0.8755
Epoch 4/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3071 - accuracy: 0.8890
Epoch 5/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2964 - accuracy: 0.8927
Epoch 6/10
1875/1875 [==============================] - 2s 985us/step - loss: 0.2764 - accuracy: 0.8955
Epoch 7/10
1875/1875 [==============================] - 2s 961us/step - loss: 0.2653 - accuracy: 0.8996
Epoch 8/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2549 - accuracy: 0.9052
Epoch 9/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2416 - accuracy: 0.9090
Epoch 10/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2372 - accuracy: 0.9086
313/313 - 0s - loss: 0.3422 - accuracy: 0.8798
Test accuracy: 0.879800021648407
The model has been trained. Let’s tweak the parameters.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(28, activation='relu'), # 128 -> 28
tf.keras.layers.Dense(10)
])
Modify the first parameter of Dense
.
Epoch 1/10
1875/1875 [==============================] - 2s 714us/step - loss: 6.9774 - accuracy: 0.3294
Epoch 2/10
1875/1875 [==============================] - 1s 715us/step - loss: 1.3038 - accuracy: 0.4831
Epoch 3/10
1875/1875 [==============================] - 1s 747us/step - loss: 1.0160 - accuracy: 0.6197
Epoch 4/10
1875/1875 [==============================] - 1s 800us/step - loss: 0.7963 - accuracy: 0.6939
Epoch 5/10
1875/1875 [==============================] - 2s 893us/step - loss: 0.7006 - accuracy: 0.7183
Epoch 6/10
1875/1875 [==============================] - 1s 747us/step - loss: 0.6675 - accuracy: 0.7299
Epoch 7/10
1875/1875 [==============================] - 1s 694us/step - loss: 0.6681 - accuracy: 0.7330
Epoch 8/10
1875/1875 [==============================] - 1s 702us/step - loss: 0.6675 - accuracy: 0.7356
Epoch 9/10
1875/1875 [==============================] - 1s 778us/step - loss: 0.6508 - accuracy: 0.7363
Epoch 10/10
1875/1875 [==============================] - 1s 732us/step - loss: 0.6532 - accuracy: 0.7350
313/313 - 0s - loss: 0.6816 - accuracy: 0.7230
Test accuracy: 0.7229999899864197
Notice the change in Test accuracy
before and after. Epoch
logs are output by the fit
function. Notice that with 128
, accuracy
goes from 0.7769
to 0.9086
. With 28
, accuracy
goes from 0.3294
to 0.7350
. This time, we see that we first use the training set to optimize loss
and accuracy
, then use the test dataset to evaluate. Let’s look at train_labels
.
print(train_labels)
[9 0 0 ... 3 0 5]
print(len(train_labels))
60000
This means that numbers 0 to 9
represent these categories. Coincidentally, class_names
also has 10 items.
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Let’s make another change.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(28, activation='relu'),
tf.keras.layers.Dense(5) # 10 -> 5
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
An error occurred.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 9 which is outside the valid range of [0, 5). Label values: 4 3 2 9 4 1 6 0 7 9 1 6 5 2 3 8 6 3 8 0 3 5 6 1 2 6 3 6 8 4 8 4
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /curiosity-courses/ml/tf/image.py:53) ]] [Op:__inference_train_function_538]
Function call stack:
train_function
Changing the third parameter of Sequential
, Dense
, to 15
resolves the issue. The results are not much different. Let’s try changing Epoch
.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(28, activation='relu'),
tf.keras.layers.Dense(15)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=15) # 10 -> 15
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
Epoch 1/15
1875/1875 [==============================] - 2s 892us/step - loss: 6.5778 - accuracy: 0.3771
Epoch 2/15
1875/1875 [==============================] - 2s 872us/step - loss: 1.3121 - accuracy: 0.4910
Epoch 3/15
1875/1875 [==============================] - 2s 909us/step - loss: 1.0900 - accuracy: 0.5389
Epoch 4/15
1875/1875 [==============================] - 1s 730us/step - loss: 1.0422 - accuracy: 0.5577
Epoch 5/15
1875/1875 [==============================] - 1s 709us/step - loss: 0.9529 - accuracy: 0.5952
Epoch 6/15
1875/1875 [==============================] - 1s 714us/step - loss: 0.9888 - accuracy: 0.5950
Epoch 7/15
1875/1875 [==============================] - 1s 767us/step - loss: 0.8678 - accuracy: 0.6355
Epoch 8/15
1875/1875 [==============================] - 1s 715us/step - loss: 0.8247 - accuracy: 0.6611
Epoch 9/15
1875/1875 [==============================] - 1s 721us/step - loss: 0.8011 - accuracy: 0.6626
Epoch 10/15
1875/1875 [==============================] - 1s 711us/step - loss: 0.8024 - accuracy: 0.6622
Epoch 11/15
1875/1875 [==============================] - 1s 781us/step - loss: 0.7777 - accuracy: 0.6696
Epoch 12/15
1875/1875 [==============================] - 1s 724us/step - loss: 0.7764 - accuracy: 0.6728
Epoch 13/15
1875/1875 [==============================] - 1s 731us/step - loss: 0.7688 - accuracy: 0.6767
Epoch 14/15
1875/1875 [==============================] - 1s 715us/step - loss: 0.7592 - accuracy: 0.6793
Epoch 15/15
1875/1875 [==============================] - 1s 786us/step - loss: 0.7526 - accuracy: 0.6792
313/313 - 0s - loss: 0.8555 - accuracy: 0.6418
Test accuracy: 0.6417999863624573
Notice that changing to 15 doesn’t make much difference. tf.keras.layers.Dense(88, activation='relu')
is important. Trying changing 128 to 88 resulted in Test accuracy: 0.824999988079071
. With 128, it was 0.879800021648407
. With 28, it was 0.7229999899864197
. Does a larger value mean better results? However, when changed to 256
, it was Test accuracy: 0.8409000039100647
. This makes us ponder the meaning of loss
and accuracy
.
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
Next, let’s make a prediction. Notice that Sequential
is the same as above. Pay attention to the parameters model
and tf.keras.layers.Softmax()
.
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)
def plot_image(i, predictions_array, true_label, img):
true_label, img = true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
def plot_value_array(i, predictions_array, true_label):
true_label = true_label[i]
plt.grid(False)
plt.xticks(range(10))
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i], test_labels)
plt.show()
This indicates that this image has a 99% chance of being an Ankle boot
. Notice that plot_image
displays the left chart, and plot_value_array
outputs the right chart.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()
Notice that this just displays more test results. So, we roughly understand the usage flow. We still don’t know how the calculations work behind the scenes, but we know how to use them. Behind it all is calculus. How do we understand calculus?
For example, there’s a number between 1 and 100 for you to guess. Each time you guess, I tell you if it’s too low or too high. You guess 50. I say too low. You guess 80. I say too high. You guess 65. I say too high. You guess 55. I say too low. You guess 58. I say, yes, you got it.
Machine learning simulates a similar process behind the scenes, just more complex. It might involve many ranges like 1 to 100
, guessing many numbers at once. Each guess involves a lot of calculations, and determining if it’s too high or too low also requires many computations.