Introduction to Machine Learning

Introduction to Machine Learning - TensorFlow | Original

Home 2025.09

This post was originally written in Chinese. It has been translated to English to facilitate further translations into other languages.

Since we are learning Python, we definitely need to talk about machine learning. Many of its libraries are written in Python. Let’s start by installing them and trying them out.

TensorFlow

Let’s install it.

$ pip install tensorflow
ERROR: Could not find a version that satisfies the requirement tensorflow
ERROR: No matching distribution found for tensorflow

$ type python
python is aliased to `/usr/local/Cellar/python@3.9/3.9.1_6/bin/python3'

However, TensorFlow 2 only supports Python 3.5–3.8. We are using 3.9.

% type python3
python3 is /usr/bin/python3
% python3 -V
Python 3.8.2

Notice that the python3 on my system is version 3.8.2. Where does the corresponding pip for this Python version install packages?

% python3 -m pip -V
pip 21.0.1 from /Users/lzw/Library/Python/3.8/lib/python/site-packages/pip (python 3.8)

The corresponding pip is here. So, I’ll modify the .zprofile file. Recently, I changed my shell, and .zprofile is equivalent to the previous .bash_profile. Add a line:

alias pip3=/Users/lzw/Library/Python/3.8/bin/pip3

This way, we can use python3 and pip3 to work with TensorFlow.

% pip3 install tensorflow
...
Successfully installed absl-py-0.12.0 astunparse-1.6.3 cachetools-4.2.1 certifi-2020.12.5 chardet-4.0.0 flatbuffers-1.12 gast-0.3.3 google-auth-1.27.1 google-auth-oauthlib-0.4.3 google-pasta-0.2.0 grpcio-1.32.0 h5py-2.10.0 idna-2.10 keras-preprocessing-1.1.2 markdown-3.3.4 numpy-1.19.5 oauthlib-3.1.0 opt-einsum-3.3.0 protobuf-3.15.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-2.25.1 requests-oauthlib-1.3.0 rsa-4.7.2 tensorboard-2.4.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.4.1 tensorflow-estimator-2.4.0 termcolor-1.1.0 typing-extensions-3.7.4.3 urllib3-1.26.3 werkzeug-1.0.1 wheel-0.36.2 wrapt-1.12.1

Many libraries were installed. Let’s use an example from the official website.

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
print(predictions)

Let’s run it.

$ /usr/bin/python3 tf.py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 10s 1us/step
[[ 0.15477428 -0.3877643   0.0994779   0.07474922 -0.26219758 -0.03550266
   0.32226565 -0.37141111  0.10925996 -0.0115255 ]]

As you can see, the dataset was downloaded, and the results were output.

Next, let’s look at an image classification example.

# TensorFlow and tf.keras
import tensorflow as tf

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

An error occurred.

ModuleNotFoundError: No module named 'matplotlib'

Let’s install it.

% pip3 install matplotlib

Now it’s correct.

$ /usr/bin/python3 image.py
2.4.1

Let’s copy and paste the example code.

# TensorFlow and tf.keras
import tensorflow as tf

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
print(train_images.shape)
print(len(train_labels))

The results are output. Notice that we have train_images, train_labels, test_images, and test_labels. These are split into training and testing datasets.

(60000, 28, 28)
60000

Next, let’s try to display an image.

print(train_images[0])

Let’s see the result.

[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0  13  73   0
    0   1   4   0   0   0   0   1   1   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   3   0  36 136 127  62
   54   0   0   0   1   3   4   0   0   3]
 [  0   0   0   0   0   0   0   0   0   0   0   0   6   0 102 204 176 134
  144 123  23   0   0   0   0  12  10   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0 155 236 207 178
  107 156 161 109  64  23  77 130  72  15]
 [  0   0   0   0   0   0   0   0   0   0   0   1   0  69 207 223 218 216
  216 163 127 121 122 146 141  88 172  66]]
  ....

Here, part of the result is excerpted.

print(len(train_images[0][0]))

Outputs 28. So, it’s clear that this is a matrix with a width of 28. Let’s continue printing.

print(len(train_images[0][0][0]))
TypeError: object of type 'numpy.uint8' has no len()

So, it’s clear. Each image is a 28*28*3 array. The last dimension stores the RGB values. However, we might be wrong about this.

print(train_images[0][1][20])

print(train_images[0][1])

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

This shows that each image is a 28*28 array. After some tinkering, we finally figured out the secret.

Let’s first look at the output image.

plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()

Do you see the color bar on the right? From 0 to 250. It turns out this is a gradient between two colors. But how does it know which two colors? Where did we specify that?

Next, let’s print the second image as well.

plt.imshow(train_images[1])

plt

Very interesting. Is this the default of the pyplot library? Let’s continue running the code provided by the official website.

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

tf2

Notice that the images and their classifications are displayed here. Finally, we understand the cmap parameter. If cmap is not specified, it will default to the color scheme we saw earlier. Indeed.

plt.imshow(train_images[i])

cmap

This time, we search for pyplot cmap and find some resources.

plt.imshow(train_images[i], cmap=plt.cm.PiYG)

cmap1

Let’s modify the code.

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(2,5,i+1)   ## Modified this line
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.Blues)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

However, an error occurred.

ValueError: num must be 1 <= num <= 10, not 11

What does this mean? What exactly does the previous 5,5,i+1 mean? Why doesn’t it work when changed to 2? Although we intuitively understand it might mean 5 rows and 5 columns, why does this error occur? How is 11 calculated? What does num mean? What does 10 mean? Notice that 2*5=10. So, perhaps the error occurs when i=11. When changed to for i in range(10):, we get the following result.

plot3

This time, after briefly looking at the documentation, we learn about subplot(nrows, ncols, index, **kwargs). Okay, now we understand.

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    # plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.Blues)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

plot_xticks

Notice that things like 0 25 are called xticks. When we zoom in or out of this frame, the display changes.

plot_scale

Notice that when zooming in or out, xticks and xlabels display differently.

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=10)

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

Notice the way the model is defined here, using the Sequential class. Pay attention to these parameters: 28,28, 128, relu, 10. Notice that you need to compile and fit. fit means fitting. Notice that 28,28 corresponds to the image size.

Epoch 1/10
1875/1875 [==============================] - 2s 928us/step - loss: 0.6331 - accuracy: 0.7769
Epoch 2/10
1875/1875 [==============================] - 2s 961us/step - loss: 0.3860 - accuracy: 0.8615
Epoch 3/10
1875/1875 [==============================] - 2s 930us/step - loss: 0.3395 - accuracy: 0.8755
Epoch 4/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3071 - accuracy: 0.8890
Epoch 5/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2964 - accuracy: 0.8927
Epoch 6/10
1875/1875 [==============================] - 2s 985us/step - loss: 0.2764 - accuracy: 0.8955
Epoch 7/10
1875/1875 [==============================] - 2s 961us/step - loss: 0.2653 - accuracy: 0.8996
Epoch 8/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2549 - accuracy: 0.9052
Epoch 9/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2416 - accuracy: 0.9090
Epoch 10/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2372 - accuracy: 0.9086
313/313 - 0s - loss: 0.3422 - accuracy: 0.8798

Test accuracy: 0.879800021648407

The model has been trained. Let’s tweak the parameters.

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(28, activation='relu'),    # 128 -> 28
    tf.keras.layers.Dense(10)
])

Modify the first parameter of Dense.

Epoch 1/10
1875/1875 [==============================] - 2s 714us/step - loss: 6.9774 - accuracy: 0.3294
Epoch 2/10
1875/1875 [==============================] - 1s 715us/step - loss: 1.3038 - accuracy: 0.4831
Epoch 3/10
1875/1875 [==============================] - 1s 747us/step - loss: 1.0160 - accuracy: 0.6197
Epoch 4/10
1875/1875 [==============================] - 1s 800us/step - loss: 0.7963 - accuracy: 0.6939
Epoch 5/10
1875/1875 [==============================] - 2s 893us/step - loss: 0.7006 - accuracy: 0.7183
Epoch 6/10
1875/1875 [==============================] - 1s 747us/step - loss: 0.6675 - accuracy: 0.7299
Epoch 7/10
1875/1875 [==============================] - 1s 694us/step - loss: 0.6681 - accuracy: 0.7330
Epoch 8/10
1875/1875 [==============================] - 1s 702us/step - loss: 0.6675 - accuracy: 0.7356
Epoch 9/10
1875/1875 [==============================] - 1s 778us/step - loss: 0.6508 - accuracy: 0.7363
Epoch 10/10
1875/1875 [==============================] - 1s 732us/step - loss: 0.6532 - accuracy: 0.7350
313/313 - 0s - loss: 0.6816 - accuracy: 0.7230

Test accuracy: 0.7229999899864197

Notice the change in Test accuracy before and after. Epoch logs are output by the fit function. Notice that with 128, accuracy goes from 0.7769 to 0.9086. With 28, accuracy goes from 0.3294 to 0.7350. This time, we see that we first use the training set to optimize loss and accuracy, then use the test dataset to evaluate. Let’s look at train_labels.

print(train_labels)
[9 0 0 ... 3 0 5]
print(len(train_labels))
60000

This means that numbers 0 to 9 represent these categories. Coincidentally, class_names also has 10 items.

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Let’s make another change.

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(28, activation='relu'),
    tf.keras.layers.Dense(5)   # 10 -> 5
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=10)

An error occurred.

tensorflow.python.framework.errors_impl.InvalidArgumentError:  Received a label value of 9 which is outside the valid range of [0, 5).  Label values: 4 3 2 9 4 1 6 0 7 9 1 6 5 2 3 8 6 3 8 0 3 5 6 1 2 6 3 6 8 4 8 4
         [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /curiosity-courses/ml/tf/image.py:53) ]] [Op:__inference_train_function_538]

Function call stack:
train_function

Changing the third parameter of Sequential, Dense, to 15 resolves the issue. The results are not much different. Let’s try changing Epoch.

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(28, activation='relu'),
    tf.keras.layers.Dense(15)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=15)  # 10 -> 15

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

Epoch 1/15
1875/1875 [==============================] - 2s 892us/step - loss: 6.5778 - accuracy: 0.3771
Epoch 2/15
1875/1875 [==============================] - 2s 872us/step - loss: 1.3121 - accuracy: 0.4910
Epoch 3/15
1875/1875 [==============================] - 2s 909us/step - loss: 1.0900 - accuracy: 0.5389
Epoch 4/15
1875/1875 [==============================] - 1s 730us/step - loss: 1.0422 - accuracy: 0.5577
Epoch 5/15
1875/1875 [==============================] - 1s 709us/step - loss: 0.9529 - accuracy: 0.5952
Epoch 6/15
1875/1875 [==============================] - 1s 714us/step - loss: 0.9888 - accuracy: 0.5950
Epoch 7/15
1875/1875 [==============================] - 1s 767us/step - loss: 0.8678 - accuracy: 0.6355
Epoch 8/15
1875/1875 [==============================] - 1s 715us/step - loss: 0.8247 - accuracy: 0.6611
Epoch 9/15
1875/1875 [==============================] - 1s 721us/step - loss: 0.8011 - accuracy: 0.6626
Epoch 10/15
1875/1875 [==============================] - 1s 711us/step - loss: 0.8024 - accuracy: 0.6622
Epoch 11/15
1875/1875 [==============================] - 1s 781us/step - loss: 0.7777 - accuracy: 0.6696
Epoch 12/15
1875/1875 [==============================] - 1s 724us/step - loss: 0.7764 - accuracy: 0.6728
Epoch 13/15
1875/1875 [==============================] - 1s 731us/step - loss: 0.7688 - accuracy: 0.6767
Epoch 14/15
1875/1875 [==============================] - 1s 715us/step - loss: 0.7592 - accuracy: 0.6793
Epoch 15/15
1875/1875 [==============================] - 1s 786us/step - loss: 0.7526 - accuracy: 0.6792
313/313 - 0s - loss: 0.8555 - accuracy: 0.6418

Test accuracy: 0.6417999863624573

Notice that changing to 15 doesn’t make much difference. tf.keras.layers.Dense(88, activation='relu') is important. Trying changing 128 to 88 resulted in Test accuracy: 0.824999988079071. With 128, it was 0.879800021648407. With 28, it was 0.7229999899864197. Does a larger value mean better results? However, when changed to 256, it was Test accuracy: 0.8409000039100647. This makes us ponder the meaning of loss and accuracy.

probability_model = tf.keras.Sequential([model, 
                                         tf.keras.layers.Softmax()])

Next, let’s make a prediction. Notice that Sequential is the same as above. Pay attention to the parameters model and tf.keras.layers.Softmax().

probability_model = tf.keras.Sequential([model, 
                                         tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)

def plot_image(i, predictions_array, true_label, img):
  true_label, img = true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'

  plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  true_label = true_label[i]
  plt.grid(False)
  plt.xticks(range(10))
  plt.yticks([])
  thisplot = plt.bar(range(10), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()  

pred

This indicates that this image has a 99% chance of being an Ankle boot. Notice that plot_image displays the left chart, and plot_value_array outputs the right chart.

num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions[i], test_labels, test_images)
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()

pred1

Notice that this just displays more test results. So, we roughly understand the usage flow. We still don’t know how the calculations work behind the scenes, but we know how to use them. Behind it all is calculus. How do we understand calculus?

For example, there’s a number between 1 and 100 for you to guess. Each time you guess, I tell you if it’s too low or too high. You guess 50. I say too low. You guess 80. I say too high. You guess 65. I say too high. You guess 55. I say too low. You guess 58. I say, yes, you got it.

Machine learning simulates a similar process behind the scenes, just more complex. It might involve many ranges like 1 to 100, guessing many numbers at once. Each guess involves a lot of calculations, and determining if it’s too high or too low also requires many computations.

Back Donate