Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

DeepLearningAI

6 лет назад

121,347 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Ay Ay
Ay Ay - 01.10.2023 04:20

great explanation! thank you for sharing

Ответить
Prajwal Bharadwaj
Prajwal Bharadwaj - 12.09.2023 04:00

This covers no explanation on Vanishing Gradients, which is the case for deeper nets, where the gradients suffer to propagate back towards the starting layer of the network!!! Learning is reduced & its so slow that the model barely learns at all!! Significant ideas such as Skip Connections through caching are way necessary for preventing this problem. More widely used in CNNs to avoid data loss

Ответить
veysel aytekin
veysel aytekin - 22.01.2023 01:57

thank you so much

Ответить
Ahmed B
Ahmed B - 15.01.2023 15:01

Thank you !

Ответить
Easerr
Easerr - 14.01.2023 13:34

thanks for the knowledge being delivered in such simple terms sir
thank you

Ответить
Maitha
Maitha - 23.10.2021 10:30

Thank you Andrew Ng as usual amazing explanation!

Ответить
Abdul Mukit
Abdul Mukit - 30.07.2021 23:04

Hi, deepLearningAi.
Thank you so much for these wonderful videos. I am sure these changed a lot of lives.

Ответить
Ray Yam
Ray Yam - 19.03.2021 14:12

Imagine you use sigmoid as activation function, which derivative is always less than 1 and greater than 0. During the backprop, you need to pass the derivative from back to front, which involves multiplying a number less than 1 for many times. If your network is deep enough, the gradient in the first few layers would become extremely small (almost zero), and eventually those neurons will stop learning.

Ответить
X X
X X - 10.02.2021 07:49

The videos here requires you to watch each and everyone in a step-wise fashion. People arguing and asking too many irrelevant questions are people who did not watch the other videos LOL so shut up and learn! IDIOTS

Ответить
Sandipan Sarkar
Sandipan Sarkar - 24.12.2020 17:02

Great .But need towatch ot again

Ответить
Tamoor Khan
Tamoor Khan - 10.11.2020 15:33

In 6min vid this is the concise and spot-on explanation. Those who were expecting some intricate complex explanations, please refer to some books, don't waste time here.

Ответить
liquid
liquid - 20.05.2020 09:48

This is a bad explanation, it completely breaks down if you use a sigmoid activation function...

Ответить
bubbles grappling
bubbles grappling - 15.05.2020 20:00

completely disregards explaining what he means by z and l

Ответить
Rizvan ahmed Rafsan
Rizvan ahmed Rafsan - 28.11.2019 17:15

I feel like this explanation is a bit oversimplified. Also, what happens when the weight matrices are not some multiples of identity matrix?

Ответить
D. Refaeli
D. Refaeli - 21.09.2019 17:53

This is indeed not really about the gradient, but more about the activation. Also it assumes that W will be an Identity matrix... which is a big assumption.
I think for the gradient issue, you have to remember that the gradient for each layer is basically the inputs of that layer, times what ever the gradient was up to that layer. If you have a sigmoid/tanh activations, you will have that the inputs will always be a fraction. This might not be a big problem for the last layers, but as you back propagate backwards more and more, always multiplying by a fraction, you get smaller and smaller gradients, which makes it harder and harder for the weights of those layers to learn.
Similarly - if your activation function can takes larger values (say ReLu) - you run the risk of your gradients becoming bigger and bigger ("exploding") as you backpropagate.

Ответить
TonyX
TonyX - 23.09.2018 23:05

the vanishing/exploding of activation is not the same as vanishing/exploding gradients. this is the part not well explained in this video.

Ответить
张开顺
张开顺 - 09.08.2018 10:18

Mr. Professor, you speak so fast that I just can not catch up with you. :)

Ответить
Abhijeet Chauhan
Abhijeet Chauhan - 18.06.2018 21:46

TBH this explanation is very sloppy.

Ответить
Jesper Henriksen
Jesper Henriksen - 01.05.2018 15:35

What would be the effect of vanishing/exploding gradients? How do you know that the problem that is occurring in your network is gradient related?

Ответить