Vision Transformer for Image Classification

Vision Transformer for Image Classification

Shusen Wang

3 года назад

115,059 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

M Mazher
M Mazher - 08.09.2023 13:00

Excellent explanation 👌

Ответить
Raj Gothi
Raj Gothi - 05.09.2023 14:22

You have explained ViT in simple words. Thanks

Ответить
Carlos Alejo
Carlos Alejo - 09.07.2023 06:32

Nice video!!, Just a question what is the argue behind to rid of the vectors c1 to cn, and just remain with c0? Thanks

Ответить
Sam S
Sam S - 29.03.2023 22:56

This is a werid question but what if we made the transformer try to learn the positional encoding by itself? We give it everything but the positional encoding and we tell it to find the positional encoding. What would it learn and how would that end up helping us? I dont know if this thought will go anywhere I wanted to throw it out here.

Ответить
Nehal Kalita
Nehal Kalita - 17.03.2023 01:13

Nicely explained. Appreciate your efforts.

Ответить
Hongkyu Lee
Hongkyu Lee - 18.02.2023 18:49

Thank you for the clear explanation!!☺

Ответить
Soumyajit Datta
Soumyajit Datta - 14.02.2023 16:16

Thank you. Best ViT video I found.

Ответить
Y
Y - 11.02.2023 18:28

In the job market, do data scientists use transformers?

Ответить
kin.1997
kin.1997 - 09.02.2023 13:59

amazing precise explanation

Ответить
熊老闆i
熊老闆i - 06.02.2023 08:40

good video ,what a splendid presentation , wang shusen yyds.

Ответить
顾小杰
顾小杰 - 09.01.2023 10:57

👏

Ответить
Arash Mehrabi
Arash Mehrabi - 27.12.2022 21:05

Thank you for your Attention Models playlist. Well explained.

Ответить
Sheikh Shafayat
Sheikh Shafayat - 07.11.2022 16:47

Man, you made my day! These lectures were golden. I hope you continue to make more of these

Ответить
Mahmoud Tarek
Mahmoud Tarek - 04.11.2022 20:28

great

Ответить
Drakehinst
Drakehinst - 12.10.2022 15:45

These are some of the best, hands-on and simple explanations I've seen in a while on a new CS method. Straight to the point with no superfluous details, and at a pace that let me consider and visualize each step in my mind without having to constantly pause or rewind the video. Thanks a lot for your amazing work! :)

Ответить
Mitravind Pattnaik
Mitravind Pattnaik - 06.09.2022 18:40

Can't stress enough on how easy to understand you made it

Ответить
Muhammad Faseeh
Muhammad Faseeh - 02.09.2022 22:54

Awesome Explanation.
Thank you

Ответить
Ervin Peretz
Ervin Peretz - 28.08.2022 03:21

This is a great explanation video.
One nit : you are misusing the term 'dimension'. If a classification vector is linear with 8 values, that's not '8-dimensional' -- it is a 1-dimensional vector with 8 values.

Ответить
jidd32
jidd32 - 17.08.2022 21:24

Brilliant. Thanks a million

Ответить
தமிழன் டா
தமிழன் டா - 06.08.2022 22:21

WHY is the transformer requiring so many images to train?? and why is resnet not becoming better with ore training vs ViT?

Ответить
Nova
Nova - 24.07.2022 08:22

If we ignore output c1 ... cn, what c1 ... cn represent then?

Ответить
Deep Learn
Deep Learn - 07.07.2022 22:50

Very good explanation
subscribed!

Ответить
Yuan
Yuan - 25.06.2022 10:40

这英语也是醉了

Ответить
Sea Kan
Sea Kan - 17.06.2022 17:08

其实我觉得up主说中文更好🥰🤣

Ответить
ThePresistence
ThePresistence - 10.06.2022 02:23

15 minutes of heaven 🌿. Thanks a lot understood clearly!

Ответить
Amine Sehaba
Amine Sehaba - 09.06.2022 14:05

Thank you so much for this amazing presentation. You have a very clear explanation, I have learnt so much. I will definitely watch your Attention models playlist.

Ответить
Mariam Waleed
Mariam Waleed - 22.05.2022 05:30

really great explaination , thankyou

Ответить
ZEWEI CHU
ZEWEI CHU - 18.04.2022 10:56

great video!

Ответить
Random person
Random person - 04.04.2022 01:40

Not All Heroes Wear Capes <3

Ответить
TallWaters
TallWaters - 31.03.2022 04:02

Brilliant explanation, thank you.

Ответить
Valentin Fontanger
Valentin Fontanger - 26.03.2022 11:35

Amazing, I am in a rush to implement vision transformer as an assignement, and this saved me so much time !

Ответить
Medo Med
Medo Med - 01.03.2022 01:43

Great explanation

Ответить
Saeed Ataei
Saeed Ataei - 25.02.2022 21:07

great video. thanks. could u plz explain swin transformer too?

Ответить
Tianbao Xie
Tianbao Xie - 06.02.2022 17:45

Very clear, thanks for your work.

Ответить
Ahmed Ismail
Ahmed Ismail - 30.01.2022 06:42

The simplest and more interesting explanation, Many Thanks. I am asking about object detection models, did you explain it before?

Ответить
Sudhakar Tummala
Sudhakar Tummala - 28.01.2022 15:47

Wonderful talk

Ответить
ogsconnect
ogsconnect - 24.01.2022 21:11

Good job! Thanks

Ответить
Lion Huang
Lion Huang - 21.12.2021 18:44

Very clear, thanks for your work.

Ответить
M E
M E - 01.12.2021 14:15

Really good, thx.

Ответить
Niels Ohlsen
Niels Ohlsen - 28.10.2021 01:48

Very nice job, Shusen, thanks!

Ответить
appifyers
appifyers - 25.10.2021 15:39

Great great great

Ответить
Mona Jalal
Mona Jalal - 08.10.2021 20:51

This was a great video. Thanks for your time producing great content.

Ответить
Ansh Arora
Ansh Arora - 07.10.2021 11:13

Great explanation :)

Ответить
PeaceGTV
PeaceGTV - 03.10.2021 07:15

The concept has similarities to TCP protocol in terms of segmentation and positional encoding. 😅😅😅

Ответить
PeaceGTV
PeaceGTV - 03.10.2021 07:14

This reminds me of Encarta encyclopedia clips when I was a kid lol! Good job mate!

Ответить
Shams Arfeen
Shams Arfeen - 03.09.2021 21:54

If you remove the positional encoding step, the whole thing is almost equivalent to a CNN, right?
I mean those dense layers are just as filters of a CNN.

Ответить
Derek Chia
Derek Chia - 26.08.2021 20:08

Thank you, your video is way underrated. Keep it up!

Ответить
Dung Pham
Dung Pham - 09.08.2021 15:40

Amazing video. It helped me to really understand the vision transformers. Thanks a lot. But i have a question why we only use token cls for classifier .

Ответить