「ゼロから作るDeep Learning ① (Pythonで学ぶディープラーニングの理論と実装)」 p.147～154の写経として、以下の朱書き部分を計算グラフ化します。

Affineレイヤ

Affine層とは「X・W + B = O」らしく、この計算グラフは以下の通りです。

朱書き部分は、微分による誤差逆伝播です。

さらに、これをミニバッチに対応さたたものが以下。

以下は、Affine layerのpython実装

class Affine:
    def __init__(self, W, b):
        self.W =W
        self.b = b
        self.x = None
        self.original_x_shape = None
        self.dW = None
        self.db = None

    def forward(self, x):
        self.original_x_shape = x.shape
        x = x.reshape(x.shape[0], -1)
        self.x = x

        out = np.dot(self.x, self.W) + self.b
        return out

    def backward(self, dout):
        dx = np.dot(dout, self.W.T) # W.Tはnumpyによる転置行列
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)
        # 変数前のアスタリスクは、入力値の分割
        dx = dx.reshape(*self.original_x_shape)
        return dx

Softmax-with-Lossレイヤ

画面幅の都合から、Softmax部分と、Cross Entropy Error部分を分けて記載します。

以下は、python実装

import numpy as np

class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None # 損失
        self.y    = None # softmax の出力
        self.t    = None # 教師データ（one-hot vector）
    def forward(self, x, t):
        self.t = t
        self.y = self.softmax(x)
        self.loss = cross_entropy_error(self.y, self.t)
        return self.loss
    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        dx = (self.y - self.t) / batch_size
        return dx

    def softmax(self,x):
        x = x - np.max(x, axis=-1, keepdims=True)   # オーバーフロー対策
        return np.exp(x) / np.sum(np.exp(x), axis=-1, keepdims=True)

    def cross_entropy_error(y, t):
        if y.ndim == 1:
            t = t.reshape(1, t.size)
            y = y.reshape(1, y.size)
            
        # 教師データがone-hot-vectorの場合、正解ラベルのindexへ
        if t.size == y.size:
            t = t.argmax(axis=1)
             
        batch_size = y.shape[0]
        return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size