反向传播 – 多层感知器实现:权重变得疯狂

我正在用单个输出单元(二进制分类)编写MLP的简单实现.我需要它用于教学目的,所以我不能使用现有的实现:(

我设法创建了一个工作虚拟模型并实现了训练功能,但MLP并没有收敛.实际上,输出单元的梯度在时期上保持很高,因此其权重接近无穷大.

我的实施:

import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

X = np.loadtxt('synthetic.txt')
t = X[:, 2].astype(np.int)
X = X[:, 0:2]

# Sigmoid activation function for output unit
def logistic(x):
    return 1/(1 + np.exp(-x))

# derivative of the tanh activation function for hidden units
def tanh_deriv(x):
    return 1 - np.tanh(x)*np.tanh(x)

input_num = 2            # number of units in the input layer
hidden_num = 2           # number of units in the hidden layer

# initialize weights with random values:
weights_hidden =  np.array((2 * np.random.random( (input_num + 1, hidden_num + 1) ) - 1 ) * 0.25)
weights_out =  np.array((2 * np.random.random(  hidden_num + 1 ) - 1 ) * 0.25)


def predict(x):
    global input_num
    global hidden_num
    global weights_hidden 
    global weights_out 

    x = np.append(x.astype(float), 1.0)     # input to the hidden layer: features + bias term
    a = x.dot(weights_hidden)            # activations of the hidden layer
    z = np.tanh(a)                          # output of the hidden layer
    q = logistic(z.dot(weights_out))     # input to the output (decision) layer
    if q >= 0.5:
        return 1
    return 0



def train(X, t, learning_rate=0.2, epochs=50):
    global input_num
    global hidden_num
    global weights_hidden 
    global weights_out 

    weights_hidden =  np.array((2 * np.random.random( (input_num + 1, hidden_num + 1) ) - 1 ) * 0.25)
    weights_out =  np.array((2 * np.random.random(  hidden_num + 1 ) - 1 ) * 0.25)

    for epoch in range(epochs):
        gradient_out = 0.0                       # gradients for output and hidden layers
        gradient_hidden = []

        for i in range(X.shape[0]):            
        # forward propagation
            x = np.array(X[i])                      
            x = np.append(x.astype(float), 1.0)  # input to the hidden layer: features + bias term
            a = x.dot(weights_hidden)            # activations of the hidden layer
            z = np.tanh(a)                       # output of the hidden layer
            q = z.dot(weights_out)               # activations to the output (decision) layer
            y = logistic(q)                      # output of the decision layer

        # backpropagation
            delta_hidden_s = []                  # delta and gradient for a single training sample (hidden layer)
            gradient_hidden_s = []

            delta_out_s = t[i] - y               # delta and gradient for a single training sample (output layer)
            gradient_out_s = delta_out_s * z

            for j in range(hidden_num + 1):                 
                delta_hidden_s.append(tanh_deriv(a[j]) * (weights_out[j] * delta_out_s))
                gradient_hidden_s.append(delta_hidden_s[j] * x)

            gradient_out = gradient_out + gradient_out_s             # accumulate gradients over training set
            gradient_hidden = gradient_hidden + gradient_hidden_s

    print "\n#", epoch, "Gradient out: ",gradient_out, 
        print "\n     Weights  out: ", weights_out

        # Now updating weights
        weights_out = weights_out - learning_rate * gradient_out

        for j in range(hidden_num + 1):
            weights_hidden.T[j] = weights_hidden.T[j] - learning_rate * gradient_hidden[j]



train(X, t, 0.2, 50)

并且在时期上输出单位的梯度和权重的演变:

0 Gradient out:  [ 11.07640724  -7.20309009   0.24776626] 
    Weights  out:  [-0.15397237  0.22232593  0.03162811]

  1 Gradient out:  [ 23.68791197 -19.6688382   -1.75324703] 
    Weights  out:  [-2.36925382  1.66294395 -0.01792515]

  2 Gradient out:  [ 79.08612305 -65.76066015  -7.70115262] 
    Weights  out:  [-7.10683621  5.59671159  0.33272426]

  3 Gradient out:  [ 99.59798656 -93.90973727 -21.45674943] 
    Weights  out:  [-22.92406082  18.74884362   1.87295478]

49 Gradient out:  [ 107.89975864 -105.8654327  -104.69591522] 
     Weights  out:  [-1003.67912726   976.87213404   922.38862049]

我尝试了不同的数据集,各种数量的隐藏单位.我尝试用加法而不是减法来更新权重……没有什么可以帮助……

有人能告诉我可能有什么问题吗?
提前致谢

我不相信你应该使用平方和误差函数进行二进制分类.相反,你应该使用交叉熵误差函数,它基本上是一个似然函数.这样,从正确答案预测的时间越长,错误就越贵.请阅读Christopher Bishop撰写的“模式识别和机器学习”第235页的“网络培训”部分,这将为您提供有关如何在FFNN上进行监督学习的正确概述.

偏置单元非常重要,因此它们可以实现传输功能.沿着x曲线移动.权重将改变转移功能的陡度.曲线.注意偏差和权重之间的这种差异,因为它可以很好地理解为什么它们都需要存在于FFNN中.

相关文章
相关标签/搜索