Vanishing/Exploding Gradients (C2W1L10)

DeepLearningAI

6 лет назад

121,347 Просмотров

Скачать видео

Комментарии:

Ay Ay - 01.10.2023 04:20

great explanation! thank you for sharing

Ответить

Prajwal Bharadwaj - 12.09.2023 04:00

This covers no explanation on Vanishing Gradients, which is the case for deeper nets, where the gradients suffer to propagate back towards the starting layer of the network!!! Learning is reduced & its so slow that the model barely learns at all!! Significant ideas such as Skip Connections through caching are way necessary for preventing this problem. More widely used in CNNs to avoid data loss

Ответить

veysel aytekin - 22.01.2023 01:57

thank you so much

Ответить

Ahmed B - 15.01.2023 15:01

Thank you !

Ответить

Easerr - 14.01.2023 13:34

thanks for the knowledge being delivered in such simple terms sir
thank you

Ответить

Maitha - 23.10.2021 10:30

Thank you Andrew Ng as usual amazing explanation!

Ответить

Abdul Mukit - 30.07.2021 23:04

Hi, deepLearningAi.
Thank you so much for these wonderful videos. I am sure these changed a lot of lives.

Ответить

Ray Yam - 19.03.2021 14:12

Imagine you use sigmoid as activation function, which derivative is always less than 1 and greater than 0. During the backprop, you need to pass the derivative from back to front, which involves multiplying a number less than 1 for many times. If your network is deep enough, the gradient in the first few layers would become extremely small (almost zero), and eventually those neurons will stop learning.

Ответить

X X - 10.02.2021 07:49

The videos here requires you to watch each and everyone in a step-wise fashion. People arguing and asking too many irrelevant questions are people who did not watch the other videos LOL so shut up and learn! IDIOTS

Ответить

Sandipan Sarkar - 24.12.2020 17:02

Great .But need towatch ot again

Ответить

Tamoor Khan - 10.11.2020 15:33

In 6min vid this is the concise and spot-on explanation. Those who were expecting some intricate complex explanations, please refer to some books, don't waste time here.

Ответить

liquid - 20.05.2020 09:48

This is a bad explanation, it completely breaks down if you use a sigmoid activation function...

Ответить

bubbles grappling - 15.05.2020 20:00

completely disregards explaining what he means by z and l

Ответить

Rizvan ahmed Rafsan - 28.11.2019 17:15

I feel like this explanation is a bit oversimplified. Also, what happens when the weight matrices are not some multiples of identity matrix?

Ответить

D. Refaeli - 21.09.2019 17:53

This is indeed not really about the gradient, but more about the activation. Also it assumes that W will be an Identity matrix... which is a big assumption.
I think for the gradient issue, you have to remember that the gradient for each layer is basically the inputs of that layer, times what ever the gradient was up to that layer. If you have a sigmoid/tanh activations, you will have that the inputs will always be a fraction. This might not be a big problem for the last layers, but as you back propagate backwards more and more, always multiplying by a fraction, you get smaller and smaller gradients, which makes it harder and harder for the weights of those layers to learn.
Similarly - if your activation function can takes larger values (say ReLu) - you run the risk of your gradients becoming bigger and bigger ("exploding") as you backpropagate.

Ответить