Consider the CAPTCHA problem:
To see why speech is so hard, just look at it:
This is the visual representation of this:
Now, if you know about sound, then you know that a lot of this information is really just overlapping frequencies and volumes which can be extracted and turned into a form which is easier to work with.
Unfortunately, humans can understand sounds with differnet
Not only that, but we humans are able to understand many different languages by learning them. So we already know that any model of reasonable complexity will work anyways.
\(y = \sigma(\bar{a} \cdot \bar{x} + b)\)
\(\text{hidden} = \sigma(W_1 \cdot \text{input} + b_1)\) \(\text{output} = \sigma(W_2 \cdot \text{hidden} + b_2)\)
Learning representations by back-propagating errors
Bad example of backpropagation explanation
A slightly better backpropagation explanation
Yet another backpropagation algorithm
Formulas
\[\text{lin-hidden}_i = \sum_j W^1_{ij} \cdot \text{input}_j + b^1_i\] \[\text{hidden}_i = \sigma\left(\text{linhidden}_i \right)\] \[\text{lin-out}_i = \sum_j W^2_{ij} \cdot \text{hidden}_j + b^2_i\] \[\text{output}_i = \sigma\left( \text{lin-out}_i \right)\] \[\text{cost} = \sum_i (\text{output}_i - \text{actual}_i)^2\] \[\text{outerr}_i = (\text{output}_i - \text{actual}_i)\] \[\text{hiderr}_i = \sigma^\prime\left(\text{lin-out}_i\right) * \left(\sum_j W^2_{ji} \cdot \text{outerr}_j \right)\] \[\frac{\partial W^2_{ij}}{\partial \text{cost}} = \text{lin-hidden}_i\text{outerr}_j\] \[\frac{\partial W^1_{ij}}{\partial \text{cost}} = \text{input}_i \text{hiderr}_j\]\(E[g^2] = 0.9 E[g^2]_ {t-1} + 0.1g^2_t\) \(\theta_{t+1} = \theta_t - \frac{\mu}{\sqrt{E[g^2]_ {t}+\epsilon}}\)
Adding term to cost function
\[\text{cost}(y,a) = \text{old-cost}(y,a) + \frac{\lambda}{2n} \sum_{w \in \text{weights}} w^2\]http://colah.github.io/
https://en.wikipedia.org/wiki/Activation_function
Diagram
###
Deep learning and ANNs are not quite the same: deep learning implies backpropogation, ANNs make no mention of that idea.