# 再探反向传播算法（推导）

1.从前（正）向传播谈起

$\begin{array}{rl}L& =\text{神经网络总共包含的层数}\\ {S}_{l}& =\text{第}l\text{层的神经元数目}\\ K& =\text{输出层的神经元数，亦即分类的数目}\\ {w}_{ij}^{l}& =第l层第j个神经元与第l+1层第i个神经元之间的权重值\end{array}$

$\begin{array}{rl}{z}_{1}^{2}& ={a}_{1}^{1}{w}_{11}^{1}+{a}_{2}^{1}{w}_{12}^{1}+{a}_{3}^{1}{w}_{13}^{1}+{b}^{1}\\ {z}_{2}^{2}& ={a}_{1}^{1}{w}_{21}^{1}+{a}_{2}^{1}{w}_{22}^{1}+{a}_{3}^{1}{w}_{23}^{1}+{b}^{1}\\ & \phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}\left[\begin{array}{c}{z}_{1}^{2}\\ {z}_{2}^{2}\end{array}\right]={\left[\begin{array}{ccc}{w}_{11}^{1}& {w}_{12}^{1}& {w}_{13}^{1}\\ {w}_{21}^{1}& {w}_{22}^{1}& {w}_{23}^{1}\end{array}\right]}_{2×3}×{\left[\begin{array}{c}{a}_{1}^{1}\\ {a}_{2}^{1}\\ {a}_{3}^{1}\end{array}\right]}_{3×1}+\left[\begin{array}{c}{b}^{1}\\ {b}^{1}\end{array}\right]\\ & \phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}{z}^{2}={a}^{1}{w}^{1}+{b}^{1}\phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}{a}^{2}=f\left({z}^{2}\right)\\ & \phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}{z}^{3}={a}^{2}{w}^{2}+{b}^{2}\phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}{a}^{3}=f\left({z}^{3}\right)\end{array}$

$\begin{array}{}\text{(1)}& & {z}_{i}^{l+1}={a}_{1}^{l}{w}_{i1}^{l}+{a}_{2}^{l}{w}_{i2}^{l}+\cdots +{a}_{{S}_{l}}^{l}{w}_{i{S}_{l}}^{l}+{b}^{l}\text{(2)}& & {z}^{l+1}={a}^{l}{w}^{l}+{b}^{l}\text{(3)}& & {a}^{l}=f\left({z}^{l}\right)\end{array}$

2.求解梯度

$J=\frac{1}{2}\left({h}_{w,b}\left(x\right)-y{\right)}^{2}$

$\begin{array}{rl}f& =sin\left(t\right),t={x}^{2},x=5w\\ \phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}\frac{\mathrm{\partial }f}{\mathrm{\partial }w}& =\frac{\mathrm{\partial }f}{\mathrm{\partial }t}\cdot \frac{\mathrm{\partial }t}{\mathrm{\partial }x}\cdot \frac{\mathrm{\partial }x}{\mathrm{\partial }w}=cos\left(t\right)\cdot 2x\cdot 5\\ & =cos\left({x}^{2}\right)\cdot 2x\cdot 5=cos\left(25{w}^{2}\right)\cdot 10w\cdot 5=50wcos\left(25{w}^{2}\right)\end{array}$

$\begin{array}{rl}f& =sin\left({x}^{2}\right)=sin\left(25{w}^{2}\right)\\ \phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}\frac{\mathrm{\partial }f}{\mathrm{\partial }w}& =cos\left(25{w}^{2}\right)\cdot 50w=50wcos\left(25{w}^{2}\right)\end{array}$

$\begin{array}{rl}f& =g\left(t\right),t=\varphi \left(x+y\right),x=h\left(w\right),y=\mu \left(w\right)\end{array}$

$\begin{array}{rl}\phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}\frac{\mathrm{\partial }f}{\mathrm{\partial }w}& =\frac{\mathrm{\partial }f}{\mathrm{\partial }t}\cdot \frac{\mathrm{\partial }t}{\mathrm{\partial }y}\cdot \frac{\mathrm{\partial }y}{\mathrm{\partial }w}+\frac{\mathrm{\partial }f}{\mathrm{\partial }t}\cdot \frac{\mathrm{\partial }t}{\mathrm{\partial }x}\cdot \frac{\mathrm{\partial }x}{\mathrm{\partial }w}=\frac{\mathrm{\partial }f}{\mathrm{\partial }t}\cdot \left(\frac{\mathrm{\partial }t}{\mathrm{\partial }y}\cdot \frac{\mathrm{\partial }y}{\mathrm{\partial }w}+\frac{\mathrm{\partial }t}{\mathrm{\partial }x}\cdot \frac{\mathrm{\partial }x}{\mathrm{\partial }w}\right)\end{array}$

$\begin{array}{rl}\frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{11}^{1}}& =\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{1}^{3}}\cdot \frac{\mathrm{\partial }{a}_{1}^{3}}{\mathrm{\partial }{z}_{1}^{3}}\cdot \frac{\mathrm{\partial }{z}_{1}^{3}}{\mathrm{\partial }{a}_{1}^{2}}\cdot \frac{\mathrm{\partial }{a}^{2}}{\mathrm{\partial }{z}_{1}^{2}}\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{11}^{1}}+\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{2}^{3}}\cdot \frac{\mathrm{\partial }{a}_{2}^{3}}{\mathrm{\partial }{z}_{2}^{3}}\cdot \frac{\mathrm{\partial }{z}_{2}^{3}}{\mathrm{\partial }{a}_{1}^{2}}\cdot \frac{\mathrm{\partial }{a}^{2}}{\mathrm{\partial }{z}_{1}^{2}}\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{11}^{1}}\\ \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{12}^{1}}& =\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{1}^{3}}\cdot \frac{\mathrm{\partial }{a}_{1}^{3}}{\mathrm{\partial }{z}_{1}^{3}}\cdot \frac{\mathrm{\partial }{z}_{1}^{3}}{\mathrm{\partial }{a}_{1}^{2}}\cdot \frac{\mathrm{\partial }{a}^{2}}{\mathrm{\partial }{z}_{1}^{2}}\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{12}^{1}}+\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{2}^{3}}\cdot \frac{\mathrm{\partial }{a}_{2}^{3}}{\mathrm{\partial }{z}_{2}^{3}}\cdot \frac{\mathrm{\partial }{z}_{2}^{3}}{\mathrm{\partial }{a}_{1}^{2}}\cdot \frac{\mathrm{\partial }{a}^{2}}{\mathrm{\partial }{z}_{1}^{2}}\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{12}^{1}}\\ & ⋮\\ \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{22}^{2}}& =\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{2}^{3}}\cdot \frac{\mathrm{\partial }{a}_{2}^{3}}{\mathrm{\partial }{z}_{2}^{3}}\cdot \frac{\mathrm{\partial }{z}_{2}^{3}}{\mathrm{\partial }{w}_{22}^{2}}\end{array}$

3.一种高效的梯度求解办法

$\begin{array}{rl}\frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{11}^{1}}& =\left(\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{1}^{3}}\cdot \frac{\mathrm{\partial }{a}_{1}^{3}}{\mathrm{\partial }{z}_{1}^{3}}\cdot \frac{\mathrm{\partial }{z}_{1}^{3}}{\mathrm{\partial }{a}_{1}^{2}}\cdot \frac{\mathrm{\partial }{a}^{2}}{\mathrm{\partial }{z}_{1}^{2}}\right)\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{11}^{1}}+\left(\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{2}^{3}}\cdot \frac{\mathrm{\partial }{a}_{2}^{3}}{\mathrm{\partial }{z}_{2}^{3}}\cdot \frac{\mathrm{\partial }{z}_{2}^{3}}{\mathrm{\partial }{a}_{1}^{2}}\cdot \frac{\mathrm{\partial }{a}^{2}}{\mathrm{\partial }{z}_{1}^{2}}\right)\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{11}^{1}}\end{array}$

$\begin{array}{}\text{(4)}& \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{ij}^{l}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l+1}}\cdot \frac{\mathrm{\partial }{z}_{i}^{l+1}}{\mathrm{\partial }{w}_{ij}^{l}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l+1}}\cdot {a}_{j}^{l}\end{array}$

$\frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{11}^{1}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{1}^{1+1}}\cdot \frac{\mathrm{\partial }{z}_{1}^{1+1}}{\mathrm{\partial }{w}_{11}^{1}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{1}^{2}}\cdot \frac{\mathrm{\partial }{z}_{1}^{2}}{\mathrm{\partial }{w}_{11}^{1}}$

$\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l+1}}=?\phantom{\rule{thickmathspace}{0ex}}?\phantom{\rule{thickmathspace}{0ex}}?$

$\begin{array}{rl}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l}}& =\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{1}^{l+1}}\cdot \frac{\mathrm{\partial }{z}_{1}^{l+1}}{\mathrm{\partial }{z}_{i}^{l}}+\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{2}^{l+1}}\cdot \frac{\mathrm{\partial }{z}_{2}^{l+1}}{\mathrm{\partial }{z}_{i}^{l}}+\cdots +\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{{S}_{l+1}}^{l+1}}\cdot \frac{\mathrm{\partial }{z}_{{S}_{l+1}}^{l+1}}{\mathrm{\partial }{z}_{i}^{l}}\\ & =\underset{k=1}{\overset{{S}_{l+1}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\cdot \frac{\mathrm{\partial }{z}_{k}^{l+1}}{\mathrm{\partial }{z}_{i}^{l}}\\ & =\underset{k=1}{\overset{{S}_{l+1}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\cdot \frac{\mathrm{\partial }}{\mathrm{\partial }{z}_{i}^{l}}\left({a}_{1}^{l}{w}_{k1}^{l}+{a}_{2}^{l}{w}_{k2}^{l}+\cdots +{a}_{{S}_{l}}^{l}{w}_{k{S}_{l}}^{l}+{b}^{l}\right)\cdots \cdots 由（1）可知\\ & =\underset{k=1}{\overset{{S}_{l+1}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\cdot \frac{\mathrm{\partial }}{\mathrm{\partial }{z}_{i}^{l}}\underset{j=1}{\overset{{S}_{l}}{\sum }}{a}_{j}^{l}{w}_{kj}^{l}\\ & =\underset{k=1}{\overset{{S}_{l+1}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\cdot \frac{\mathrm{\partial }}{\mathrm{\partial }{z}_{i}^{l}}\underset{j=1}{\overset{{S}_{l}}{\sum }}f\left({z}_{j}^{l}\right){w}_{kj}^{l}\\ \text{(5)}& & =\underset{k=1}{\overset{{S}_{l+1}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\cdot {f}^{\prime }\left({z}_{i}^{l}\right){w}_{ki}^{l}\end{array}$

$\begin{array}{}\text{(6)}& \frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l}}=\underset{k=1}{\overset{{S}_{l+1}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\cdot {f}^{\prime }\left({z}_{i}^{l}\right){w}_{ki}^{l}\end{array}$

$\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l+1}}=\underset{k=1}{\overset{{S}_{l+2}}{\sum }}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+2}}\cdot {f}^{\prime }\left({z}_{i}^{l+1}\right){w}_{ki}^{l+1}$

$\begin{array}{}\text{(7)}& {\delta }_{i}^{l}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l}}=\underset{k=1}{\overset{{S}_{l+1}}{\sum }}{\delta }_{k}^{l+1}\cdot {f}^{\prime }\left({z}_{i}^{l}\right){w}_{ki}^{l}\left(l<=L-1\right)\end{array}$

$\begin{array}{rl}{\delta }_{i}^{L}& =\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{L}}=\frac{\mathrm{\partial }}{\mathrm{\partial }{z}_{i}^{L}}\left[\frac{1}{2}\underset{k=1}{\overset{{S}_{L}}{\sum }}\left({h}_{k}\left(x\right)-{y}_{k}{\right)}^{2}\right]\\ & =\frac{\mathrm{\partial }}{\mathrm{\partial }{z}_{i}^{L}}\left[\frac{1}{2}\underset{k=1}{\overset{{S}_{L}}{\sum }}\left(f\left({z}_{k}^{L}\right)-{y}_{k}{\right)}^{2}\right]\\ & =\left[f\left({z}_{i}^{L}\right)-{y}_{i}\right]\cdot {f}^{\prime }\left({z}_{i}^{L}\right)\\ \text{(8)}& & =\left[{a}_{i}^{L}-{y}_{i}\right]\cdot {f}^{\prime }\left({z}_{i}^{L}\right)\end{array}$

$\begin{array}{}\text{(9)}& \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{ij}^{l}}={\delta }_{i}^{l+1}\cdot {a}_{j}^{l}\end{array}$

$\begin{array}{rl}& \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}_{ij}^{l}}={\delta }_{i}^{l+1}\cdot {a}_{j}^{l}\\ & {\delta }_{i}^{l}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{l}}=\underset{k=1}{\overset{{S}_{l+1}}{\sum }}{\delta }_{k}^{l+1}\cdot {f}^{\prime }\left({z}_{i}^{l}\right){w}_{ki}^{l}\left(0

$\begin{array}{}\text{(10)}& & \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}^{l}}={\delta }^{l+1}\cdot \left({a}^{l}{\right)}^{T}\text{(11)}& & {\delta }^{l}=\left({w}^{l}{\right)}^{T}\cdot {\delta }^{l+1}\ast {f}^{\mathrm{\prime }}\left({z}^{l}\right)\text{(12)}& & {\delta }^{L}=\left[{a}^{L}-y\right]\ast {f}^{\mathrm{\prime }}\left({z}^{L}\right)\end{array}$

$\begin{array}{rl}& Step1:{\delta }^{3}=\left[{a}^{3}-y\right]\ast {f}^{\mathrm{\prime }}\left({z}^{3}\right)\\ & Step2:\frac{\mathrm{\partial }J}{\mathrm{\partial }{w}^{2}}={\delta }^{3}\cdot \left({a}^{2}{\right)}^{T}\\ & Step3:{\delta }^{2}=\left({w}^{2}{\right)}^{T}\cdot {\delta }^{3}\ast {f}^{\mathrm{\prime }}\left({z}^{2}\right)\\ & Step4:\frac{\mathrm{\partial }J}{\mathrm{\partial }{w}^{1}}={\delta }^{2}\cdot \left({a}^{1}{\right)}^{T}\end{array}$

1.最先求解出导数的参数一定位于第 $L-1$$L-1$层上(如此处的 ${w}^{2}$$w^2$)；
2.要想求解第 $l$$l$层参数的导数，一定会用到第 $l+1$$l+1$层上的中间变量 ${\delta }^{l+1}$$\delta^{l+1}$(如此处求解 ${w}^{1}$$w^1$的导数，用到了 ${\delta }^{2}$$\delta^2$);
3.整个过程是从后往前的；

4.总结

$\begin{array}{}\text{(13)}& & \frac{\mathrm{\partial }J}{\mathrm{\partial }{w}^{l}}={\delta }^{l+1}\cdot \left({a}^{l}{\right)}^{T}\text{(14)}& & {\delta }^{l}=\left({w}^{l}{\right)}^{T}\cdot {\delta }^{l+1}\ast {f}^{\mathrm{\prime }}\left({z}^{l}\right)\text{(15)}& & {\delta }_{i}^{L}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{i}^{L}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{i}^{L}}\cdot \frac{\mathrm{\partial }{a}_{i}^{L}}{\mathrm{\partial }{z}_{i}^{L}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{i}^{L}}\cdot \frac{\mathrm{\partial }f\left({z}_{i}^{L}\right)}{\mathrm{\partial }{z}_{i}^{L}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{i}^{L}}\cdot {f}^{\prime }\left({z}_{i}^{L}\right)\text{(16)}& & \frac{\mathrm{\partial }J}{\mathrm{\partial }{b}^{l}}={\delta }^{l+1}\end{array}$