先程のエントリの続きで、 o'reilly「ゼロから作る Deep Learning」4章にある勾配降下法を写経.

ざっくり、勾配降下法とは

「ニュートン法...?」のような印象です。

${ \Large { x_0 = x_0 - η \frac {d f(x)}{dx_0} x_1 = x_1 - η \frac {d f(x)}{dx_1} } }$

上記のように、x0, x1 を上書きしながら、近似値解を求めます。

η(エータ)は学習率(learnig rate)で、 - 大きすぎると、発散し - 小さすぎると、解に近づきません

二次式に対する勾配降下

${ \Large f(x_0, x_1) = {x_0}^2 + {x_1}^2 }$

以下のscriptは上記二次式に対する勾配降下です。

#!/usr/local/bin/python
# coding: utf-8

import numpy as np

import matplotlib
matplotlib.use('Agg')
import matplotlib.pylab as plt


def main():
    init_x = np.array([-3.0, 4.0])

    lr = 0.1
    step_num = 20
    # 勾配降下
    x, x_history = gradient_descent(function_2,
                                    init_x,
                                    lr=lr,  # 学習率
                                    step_num=step_num)

    plt.plot( [-5, 5], [0,0], '--b')
    plt.plot( [0,0], [-5, 5], '--b')
    plt.plot(x_history[:,0], x_history[:,1], 'o')

    plt.xlim(-3.5, 3.5)
    plt.ylim(-4.5, 4.5)
    plt.xlabel("X0")
    plt.ylabel("X1")
    plt.savefig( 'gradient_method.png' )


# 勾配降下
def gradient_descent(f, init_x, lr=0.01, step_num=100):
    x = init_x
    x_history = []

    for i in range(step_num):
        x_history.append( x.copy() )

        grad = numerical_gradient(f, x)
        x -= lr * grad

    return x, np.array(x_history)


# 数値微分
def numerical_gradient(f, X):
    if X.ndim == 1:             # ndim: 次元数
        return _numerical_gradient_no_batch(f, X)
    else:
        grad = np.zeros_like(X) # X と同じ形状の「0 配列」作成
        
        for idx, x in enumerate(X):
            grad[idx] = _numerical_gradient_no_batch(f, x)
        
        return grad


# 数値微分(詳細?) ...  dy/dx = lim(h->0) { f(x+h) - f(x-h)} / 2h }
def _numerical_gradient_no_batch(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)   # X と同じ形状の「0 配列」作成
  
    for idx in range(x.size):
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val # 値を元に戻す
        
    return grad

def function_2(x):
    return x[0]**2 + x[1]**2


if __name__ == '__main__':
    main()

↑こう書くと、↓こう表示されます

f:id:end0tknr:20171119094713p:plain

ニューラルネットワークに対する勾配降下

${ \Large 重み： W = \left( \begin{array}{ccc} w_{11} & w_{21} & w_{31} \\\ w_{12} & w_{22} & w_{32} \end{array} \right) }$

${ \Large 損失関数： \frac {\partial L}{\partial W} = \left( \begin{array}{ccc} \frac {\partial L}{\partial w_{11}} & \frac {\partial L}{\partial w_{21}} & \frac {\partial L}{\partial w_{31}} \\\ \frac {\partial L}{\partial w_{12}} & \frac {\partial L}{\partial w_{22}} & \frac {\partial L}{\partial w_{32}} \end{array} \right) }$

以下は、上記の重みと、損失関数を持つニューラルネットワークに対する勾配降下。

#!/usr/local/bin/python
# coding: utf-8
import numpy as np


def main():
    x = np.array([0.6, 0.9])
    t = np.array([0, 0, 1])

    net = simpleNet()
    
    f = lambda w: net.loss(x, t)
    dW = numerical_gradient(f, net.W)

    print(dW)


class simpleNet:
    def __init__(self):
        # 正規分布(ガウシアン)乱数な 2x3行列作成
        self.W = np.random.randn(2,3)

    # 予測
    def predict(self, x):
        return np.dot(x, self.W) # 内積

    # 損失関数
    def loss(self, x, t):
        z = self.predict(x)
        y = softmax(z)
        loss = cross_entropy_error(y, t)

        return loss

# ソフトマックス
def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) # オーバーフロー対策
    return np.exp(x) / np.sum(np.exp(x))


# 交差エントロピー誤差
def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        
    # 教師データがone-hot-vectorの場合、正解ラベルのインデックスに変換
    if t.size == y.size:
        t = t.argmax(axis=1)
             
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size

# 数値微分
def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)  # 同じ形状の「0 配列」作成

    # numpy.nditer()により、次元数が増えても loopのnestが不要になります
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])

    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val # 値を元に戻す
        it.iternext()
        
    return grad


if __name__ == '__main__':
    main()

↑こう書くと、↓こう表示されます

[[ 0.12770283  0.07875937 -0.2064622 ]
 [ 0.19155424  0.11813906 -0.3096933 ]]

end0tknr's kipple - web写経開発

太宰府天満宮の狛犬って、妙にカワイイ

勾配降下法による最小値算出 by python

ざっくり、勾配降下法とは

二次式に対する勾配降下

ニューラルネットワークに対する勾配降下