빅데이터 | 머신러닝 | 딥러닝/딥러닝

[jupyter notebook] Neural Network (use mnist dataset)

냠냠:) 2020. 4. 18. 03:32

[해당 내용은 "텐서플로우로 시작하는 딥러닝 기초" 수업을 듣고 복습하는 의미로 올리는 글입니다.]

 

mnist dataset란?

- mnist가 가지고 있는 필기 숫자들의 28 x 28 픽셀 이미지를 보고, 0부터 9까지의 모든 숫자들에 대해 이미지가 어떤 숫자를 나타내는지 판별하기 위한 데이터 셋을 말한다.

 

[수업 중 궁금했던 것과 사용한 함수들을 정리]

 

1. from tensorflow.keras.utils import to_categorical

- to_categorical은 정수형 class vector을 binary class matrix로 변환시켜준다. 텐서의 표현에 맞게 적절히 변환한다는 표현이 맞는 것 같다. 즉, 7이란 값을 000000100이라는 이진수를 이용해 자릿수로 표현한다.

 

2. np.expand_dims(train_data, axis =-1)

-우리가 처음 받은 데이터 셋은 [N, 28, 28] 이라는 3차원 배열이었다. 하지만 np.expand_dims를 사용해 마지막 자리에 1을 추가해줘서 [N, 28, 28, 1]이라는 4차원 배열로 변환됐다. 이는 텐서에 데이터를 활용하기 위함이다.

 

3. def normalize(train_data, test_data):

- 픽셀당 가질 수 있는 값의 범위는 0~255이다. 우리는 0~1 사이의 값이 필요하므로 /255.0을 해주어 정규화를 해준다.

 

4. tf.keras.losses.categorical_crossentropy(y_pred, y_true,from_logits)

-(tensorflow doc)에 따르면 Computes the categorical crossentropy loss. 즉 범주에 속하는 교차엔트로피 손실을 계산한다이다.  y_pred는 예측, y_true는 labels값을 받습니다. from_logits은 y_pred가 logits인지 판단하는 것입니다. 기본적으로 y_pred의 확률 분포를 인코딩합니다. (당장은 무슨 말인지 모르겠다. 차차 알아가야겠다.)

 

5. tf.equal(tf.argmax(logits, -1), tf.argmax(labels, -1))

- tf.equal 함수를 사용해 logits과 labels 즉, 예측값과 실제 라벨 값

 

6. tape.gradient(loss, model.variables)

- 수 없이 봤던 grad함수에 리턴 값이다. loss값의 기초하여 model.variables들을 업데이트할 목적을 갖고 있다.

 

7. optimizer = tf.keras.optimizers.Adam()

아직 이해할 수 있는 수준은 아닌 것 같다. 내가 이해하기로 Adam이 SGD보단 좋다는 것이다.

(https://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html)

 

8. tf.data.Dataset.from_tensor_slices(()). shuffle(). prefetch(). batch()

- shuffle : doc를 살펴보니 데이터 셋은 버퍼에 버퍼사이즈 만큼 데이터셋의 데이터를 섞어 만든다. 

- prefetch : buffer_size 만큼 데이터셋을 만든다.

- batch : 한 번에 지정된 숫자만큼 데이터를 출력한다.

아래 블로그가 도움이 됐다.

https://datascienceschool.net/view-notebook/57714103a75c43ed9a7d95f96135f0ad/

 

[실행 중 발생했던 오류]

InternalError (see above for traceback): Blas GEMM launch failed

- 주피터를 다시 켜주기만 하면 된다.

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist
from time import time
import os

print(tf.__version__)
 
2.0.0-beta1
In [2]:
def load(model, checkpoint_dir):
    print("[*] Reading checkpoints..")
    
    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    if ckpt :
        ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
        checkpoint = tf.train.Checkpoint(dnn=model)
        checkpoint.restore(save_path= os.path.join(checkpoint_dir, ckpt_name))
        counter = int(ckpt_name.split('-')[1])
        print("[*] Success to read {}".format(ckpt_name))
        return True, counter
    else:
        print("[*] Failed to find a checkpoint")
        return False, 0
    
def check_folder(dir):
    if not os.path.exists(dir):
        os.makedirs(dir)
    return dir
In [3]:
#Data load & pre-processing function

def load_mnist():
    (train_data, train_labels), (test_data, test_labels) = mnist.load_data()
    train_data = np.expand_dims(train_data, axis= -1) #[N, 28, 28] -> [N,28,28,1]
    test_data = np.expand_dims(test_data, axis= -1) # [N, 28, 28] -> [N,28,28, 1]
    
    train_data, test_data = normalize(train_data, test_data)
    
    train_labels = to_categorical(train_labels, 10) #[N,] -> [N, 10] 
    test_labels = to_categorical(test_labels, 10) 
    
    return train_data, train_labels, test_data, test_labels

def normalize(train_data, test_data):
    train_data = train_data.astype(np.float32) / 255.0
    test_data = test_data.astype(np.float32) / 255.0
    return train_data, test_data
In [4]:
#Performance function
def loss_fn(model, images, labels):
    logits = model(images, training=True)
    loss = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_pred=logits, y_true=labels,
                                                                  from_logits=True))
    return loss

def accuracy_fn(model, images, labels):
    logits = model(images, training=False)
    prediction = tf.equal(tf.argmax(logits, -1), tf.argmax(labels, -1))
    accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
    return accuracy

def grad(model, images, labels):
    with tf.GradientTape() as tape:
        loss = loss_fn(model,images, labels)
    return tape.gradient(loss, model.variables)
In [5]:
#model function
def flatten():
    return tf.keras.layers.Flatten()

def dense(label_dim, weight_init):
    return tf.keras.layers.Dense(units=label_dim, use_bias=True, kernel_initializer=weight_init)

def sigmoid():
    return tf.keras.layers.Activation(tf.keras.activations.sigmoid)
In [6]:
#create model(class version)

class create_model_class(tf.keras.Model):
    def __init__(self, label_dim):
        super(create_model_class,self).__init__()
        weight_init = tf.keras.initializers.RandomNormal
        
        self.model = tf.keras.Sequential()
        self.model.add(flatten())
        
        for i in range(2):
            self.model.add(dense(256, weight_init))
            self.model.add(sigmoid)
        
        self.model.add(dense(label_dim, weight_init))
        
    def call(self, x, training=None, mask=None):
        x = self.model(x)
        
        return x
In [7]:
# create model

def create_model_function(label_dim):
    weight_init = tf.keras.initializers.RandomNormal()
    
    model = tf.keras.Sequential()
    model.add(flatten())
    
    for i in range(2):
        model.add(dense(256, weight_init))
        model.add(sigmoid())
    
    model.add(dense(label_dim, weight_init))
    
    return model
In [8]:
#define data & hyper-parameter

"""dataset"""
train_x, train_y ,test_x, test_y = load_mnist()

"""parameters"""

learning_rate = 0.001
batch_size = 128

training_epochs = 1
training_iterations = len(train_x)  #bacth_Size

label_dim = 10

train_flag = True

"""Graph input using Dataset API"""
train_dataset = tf.data.Dataset.from_tensor_slices((train_x,train_y)).\
    shuffle(buffer_size=100000).\
    prefetch(buffer_size=batch_size).\
    batch(batch_size, drop_remainder=True)

test_dataset = tf.data.Dataset.from_tensor_slices((test_x,test_y)).\
    shuffle(buffer_size=100000).\
    prefetch(buffer_size=len(test_x)).\
    batch(len(test_x))
In [9]:
#Define model & optimizer & writer

"""Model"""
network = create_model_function(label_dim)

"""Training"""
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

"""writer"""
checkpoint_dir = 'checkpoint'
logs_dir = 'logs'

model_dir = 'nn_softmax'

checkpoint_dir = os.path.join(checkpoint_dir, model_dir)
check_folder(checkpoint_dir)
checkpoint_prefix = os.path.join(checkpoint_dir, model_dir)
logs_dir = os.path.join(logs_dir, model_dir)
In [10]:
if train_flag :

    checkpoint = tf.train.Checkpoint(dnn=network)

    # create writer for tensorboard
    summary_writer = tf.summary.create_file_writer(logdir=logs_dir)
    start_time = time()

    # restore check-point if it exits
    could_load, checkpoint_counter = load(network, checkpoint_dir)    

    if could_load:
        start_epoch = (int)(checkpoint_counter / training_iterations)        
        counter = checkpoint_counter        
        print(" [*] Load SUCCESS")
    else:
        start_epoch = 0
        start_iteration = 0
        counter = 0
        print(" [!] Load failed...")
    
    # train phase
    with summary_writer.as_default():  # for tensorboard
        for epoch in range(start_epoch, training_epochs):
            for idx, (train_input, train_label) in enumerate(train_dataset):            
                grads = grad(network, train_input, train_label)
                optimizer.apply_gradients(grads_and_vars=zip(grads, network.variables))

                train_loss = loss_fn(network, train_input, train_label)
                train_accuracy = accuracy_fn(network, train_input, train_label)
                
                for test_input, test_label in test_dataset:                
                    test_accuracy = accuracy_fn(network, test_input, test_label)

                tf.summary.scalar(name='train_loss', data=train_loss, step=counter)
                tf.summary.scalar(name='train_accuracy', data=train_accuracy, step=counter)
                tf.summary.scalar(name='test_accuracy', data=test_accuracy, step=counter)

                print(
                    "Epoch: [%2d] [%5d/%5d] time: %4.4f, train_loss: %.8f, train_accuracy: %.4f, test_Accuracy: %.4f" \
                    % (epoch, idx, training_iterations, time() - start_time, train_loss, train_accuracy,
                       test_accuracy))
                counter += 1                
        checkpoint.save(file_prefix=checkpoint_prefix + '-{}'.format(counter))
        
# test phase      
else :
    _, _ = load(network, checkpoint_dir)
    for test_input, test_label in test_dataset:    
        test_accuracy = accuracy_fn(network, test_input, test_label)

    print("test_Accuracy: %.4f" % (test_accuracy))
 
[*] Reading checkpoints..
[*] Failed to find a checkpoint
 [!] Load failed...
Epoch: [ 0] [    0/60000] time: 0.8268, train_loss: 2.36806631, train_accuracy: 0.1250, test_Accuracy: 0.1096
Epoch: [ 0] [    1/60000] time: 0.9245, train_loss: 2.28677988, train_accuracy: 0.1328, test_Accuracy: 0.1010
Epoch: [ 0] [    2/60000] time: 1.0192, train_loss: 2.25089169, train_accuracy: 0.1562, test_Accuracy: 0.1136
Epoch: [ 0] [    3/60000] time: 1.1230, train_loss: 2.29961181, train_accuracy: 0.0547, test_Accuracy: 0.1135
Epoch: [ 0] [    4/60000] time: 1.2247, train_loss: 2.26263475, train_accuracy: 0.2578, test_Accuracy: 0.1896
Epoch: [ 0] [    5/60000] time: 1.3573, train_loss: 2.25411654, train_accuracy: 0.1719, test_Accuracy: 0.1032
Epoch: [ 0] [    6/60000] time: 1.4590, train_loss: 2.26882124, train_accuracy: 0.0938, test_Accuracy: 0.1032
Epoch: [ 0] [    7/60000] time: 1.5598, train_loss: 2.25703335, train_accuracy: 0.1250, test_Accuracy: 0.1080
Epoch: [ 0] [    8/60000] time: 1.6605, train_loss: 2.28595734, train_accuracy: 0.1562, test_Accuracy: 0.1756
Epoch: [ 0] [    9/60000] time: 1.7582, train_loss: 2.25078464, train_accuracy: 0.2266, test_Accuracy: 0.2537
Epoch: [ 0] [   10/60000] time: 1.8700, train_loss: 2.20269799, train_accuracy: 0.1328, test_Accuracy: 0.1370
Epoch: [ 0] [   11/60000] time: 1.9817, train_loss: 2.20254898, train_accuracy: 0.0938, test_Accuracy: 0.1016
Epoch: [ 0] [   12/60000] time: 2.0784, train_loss: 2.19096851, train_accuracy: 0.1484, test_Accuracy: 0.1648
Epoch: [ 0] [   13/60000] time: 2.1742, train_loss: 2.19529772, train_accuracy: 0.2422, test_Accuracy: 0.2954
Epoch: [ 0] [   14/60000] time: 2.2799, train_loss: 2.17700243, train_accuracy: 0.3594, test_Accuracy: 0.3847
Epoch: [ 0] [   15/60000] time: 2.3975, train_loss: 2.16270113, train_accuracy: 0.3906, test_Accuracy: 0.4139
Epoch: [ 0] [   16/60000] time: 2.4953, train_loss: 2.17175102, train_accuracy: 0.3125, test_Accuracy: 0.3793
Epoch: [ 0] [   17/60000] time: 2.5940, train_loss: 2.11604834, train_accuracy: 0.3672, test_Accuracy: 0.3444
Epoch: [ 0] [   18/60000] time: 2.7027, train_loss: 2.14183974, train_accuracy: 0.3047, test_Accuracy: 0.3715
Epoch: [ 0] [   19/60000] time: 2.8044, train_loss: 2.10157919, train_accuracy: 0.3906, test_Accuracy: 0.4031
Epoch: [ 0] [   20/60000] time: 2.9161, train_loss: 2.08549476, train_accuracy: 0.4766, test_Accuracy: 0.4693
Epoch: [ 0] [   21/60000] time: 3.0189, train_loss: 2.05075026, train_accuracy: 0.5156, test_Accuracy: 0.5158
Epoch: [ 0] [   22/60000] time: 3.1136, train_loss: 2.09807849, train_accuracy: 0.4453, test_Accuracy: 0.5510
Epoch: [ 0] [   23/60000] time: 3.2223, train_loss: 2.05387449, train_accuracy: 0.5469, test_Accuracy: 0.5968
Epoch: [ 0] [   24/60000] time: 3.3341, train_loss: 2.01676393, train_accuracy: 0.6094, test_Accuracy: 0.5998
Epoch: [ 0] [   25/60000] time: 3.4537, train_loss: 1.97315645, train_accuracy: 0.6328, test_Accuracy: 0.5864
Epoch: [ 0] [   26/60000] time: 3.5544, train_loss: 1.97140563, train_accuracy: 0.6094, test_Accuracy: 0.5702
Epoch: [ 0] [   27/60000] time: 3.6552, train_loss: 1.97519779, train_accuracy: 0.5781, test_Accuracy: 0.5855
Epoch: [ 0] [   28/60000] time: 3.7539, train_loss: 1.93528223, train_accuracy: 0.6016, test_Accuracy: 0.6285
Epoch: [ 0] [   29/60000] time: 3.8596, train_loss: 1.89910841, train_accuracy: 0.6719, test_Accuracy: 0.6472
Epoch: [ 0] [   30/60000] time: 3.9584, train_loss: 1.90951204, train_accuracy: 0.6406, test_Accuracy: 0.6637
Epoch: [ 0] [   31/60000] time: 4.0531, train_loss: 1.91478336, train_accuracy: 0.6250, test_Accuracy: 0.6922
Epoch: [ 0] [   32/60000] time: 4.1538, train_loss: 1.84158063, train_accuracy: 0.6328, test_Accuracy: 0.7132
Epoch: [ 0] [   33/60000] time: 4.2536, train_loss: 1.83942056, train_accuracy: 0.7344, test_Accuracy: 0.7183
Epoch: [ 0] [   34/60000] time: 4.3513, train_loss: 1.79055476, train_accuracy: 0.6641, test_Accuracy: 0.6973
Epoch: [ 0] [   35/60000] time: 4.4501, train_loss: 1.81603146, train_accuracy: 0.6016, test_Accuracy: 0.6982
Epoch: [ 0] [   36/60000] time: 4.5558, train_loss: 1.71743405, train_accuracy: 0.7578, test_Accuracy: 0.6941
Epoch: [ 0] [   37/60000] time: 4.6585, train_loss: 1.72784555, train_accuracy: 0.6953, test_Accuracy: 0.6924
Epoch: [ 0] [   38/60000] time: 4.7582, train_loss: 1.69969416, train_accuracy: 0.6797, test_Accuracy: 0.6810
Epoch: [ 0] [   39/60000] time: 4.8609, train_loss: 1.65071285, train_accuracy: 0.6875, test_Accuracy: 0.6756
Epoch: [ 0] [  457/60000] time: 47.7068, train_loss: 0.20553184, train_accuracy: 0.9453, test_Accuracy: 0.9276
Epoch: [ 0] [  458/60000] time: 47.8105, train_loss: 0.22368088, train_accuracy: 0.9375, test_Accuracy: 0.9281
Epoch: [ 0] [  459/60000] time: 47.9072, train_loss: 0.22513360, train_accuracy: 0.9141, test_Accuracy: 0.9280
Epoch: [ 0] [  460/60000] time: 48.0030, train_loss: 0.19532414, train_accuracy: 0.9609, test_Accuracy: 0.9279
Epoch: [ 0] [  461/60000] time: 48.1097, train_loss: 0.26649928, train_accuracy: 0.9141, test_Accuracy: 0.9276
Epoch: [ 0] [  462/60000] time: 48.2124, train_loss: 0.17376046, train_accuracy: 0.9531, test_Accuracy: 0.9276
Epoch: [ 0] [  463/60000] time: 48.3081, train_loss: 0.28178105, train_accuracy: 0.9141, test_Accuracy: 0.9285
Epoch: [ 0] [  464/60000] time: 48.4119, train_loss: 0.16605219, train_accuracy: 0.9766, test_Accuracy: 0.9286
Epoch: [ 0] [  465/60000] time: 48.5136, train_loss: 0.39161086, train_accuracy: 0.8828, test_Accuracy: 0.9283
Epoch: [ 0] [  466/60000] time: 48.6153, train_loss: 0.34757400, train_accuracy: 0.8906, test_Accuracy: 0.9289
Epoch: [ 0] [  467/60000] time: 48.7150, train_loss: 0.16808197, train_accuracy: 0.9297, test_Accuracy: 0.9287
 

 

느낀 점 : 책과 수업을 들으며 딥러닝 개념을 동시에 배우고 있다. 하지만 기본이 없음을 항상 느낀다. 무언가를 처음 배운(예를 들어 자바)것과는 조금 다른 느낌이 든다. 항상 배울 것이 많고 기본기가 튼튼해야 하고 수학적 지식이 필요하다. 

하지만 이렇게 수업을 듣고 내용을 정리하고 모르는 부분을 doc이나 블로그, 구글에 찾아보면서 아는 것이 늘어나는 것 같다. 이렇게 매일 하다 보면 실력이 많이 늘 것이라고 믿고 있다.

반응형