Vision Transformer for Image Classification

Shusen Wang

3 года назад

115,059 Просмотров

Скачать видео

Комментарии:

M Mazher - 08.09.2023 13:00

Excellent explanation 👌

Ответить

Raj Gothi - 05.09.2023 14:22

You have explained ViT in simple words. Thanks

Ответить

Carlos Alejo - 09.07.2023 06:32

Nice video!!, Just a question what is the argue behind to rid of the vectors c1 to cn, and just remain with c0? Thanks

Ответить

Sam S - 29.03.2023 22:56

This is a werid question but what if we made the transformer try to learn the positional encoding by itself? We give it everything but the positional encoding and we tell it to find the positional encoding. What would it learn and how would that end up helping us? I dont know if this thought will go anywhere I wanted to throw it out here.

Ответить

Nehal Kalita - 17.03.2023 01:13

Nicely explained. Appreciate your efforts.

Ответить

Hongkyu Lee - 18.02.2023 18:49

Thank you for the clear explanation!!☺

Ответить

Soumyajit Datta - 14.02.2023 16:16

Thank you. Best ViT video I found.

Ответить

Y - 11.02.2023 18:28

In the job market, do data scientists use transformers?

Ответить

kin.1997 - 09.02.2023 13:59

amazing precise explanation

Ответить

熊老闆i - 06.02.2023 08:40

good video ,what a splendid presentation , wang shusen yyds.

Ответить

顾小杰 - 09.01.2023 10:57

👏

Ответить

Arash Mehrabi - 27.12.2022 21:05

Thank you for your Attention Models playlist. Well explained.

Ответить

Sheikh Shafayat - 07.11.2022 16:47

Man, you made my day! These lectures were golden. I hope you continue to make more of these

Ответить

Mahmoud Tarek - 04.11.2022 20:28

great

Ответить

Drakehinst - 12.10.2022 15:45

These are some of the best, hands-on and simple explanations I've seen in a while on a new CS method. Straight to the point with no superfluous details, and at a pace that let me consider and visualize each step in my mind without having to constantly pause or rewind the video. Thanks a lot for your amazing work! :)

Ответить

Mitravind Pattnaik - 06.09.2022 18:40

Can't stress enough on how easy to understand you made it

Ответить

Muhammad Faseeh - 02.09.2022 22:54

Awesome Explanation.
Thank you

Ответить

Ervin Peretz - 28.08.2022 03:21

This is a great explanation video.
One nit : you are misusing the term 'dimension'. If a classification vector is linear with 8 values, that's not '8-dimensional' -- it is a 1-dimensional vector with 8 values.

Ответить

jidd32 - 17.08.2022 21:24

Brilliant. Thanks a million

Ответить

தமிழன் டா - 06.08.2022 22:21

WHY is the transformer requiring so many images to train?? and why is resnet not becoming better with ore training vs ViT?

Ответить

Nova - 24.07.2022 08:22

If we ignore output c1 ... cn, what c1 ... cn represent then?

Ответить

Deep Learn - 07.07.2022 22:50

Very good explanation
subscribed!

Ответить

Yuan - 25.06.2022 10:40

这英语也是醉了

Ответить

Sea Kan - 17.06.2022 17:08

其实我觉得up主说中文更好🥰🤣

Ответить

ThePresistence - 10.06.2022 02:23

15 minutes of heaven 🌿. Thanks a lot understood clearly!

Ответить

Amine Sehaba - 09.06.2022 14:05

Thank you so much for this amazing presentation. You have a very clear explanation, I have learnt so much. I will definitely watch your Attention models playlist.

Ответить

Mariam Waleed - 22.05.2022 05:30

really great explaination , thankyou

Ответить

ZEWEI CHU - 18.04.2022 10:56

great video!

Ответить

Random person - 04.04.2022 01:40

Not All Heroes Wear Capes <3

Ответить

TallWaters - 31.03.2022 04:02

Brilliant explanation, thank you.

Ответить

Valentin Fontanger - 26.03.2022 11:35

Amazing, I am in a rush to implement vision transformer as an assignement, and this saved me so much time !

Ответить

Medo Med - 01.03.2022 01:43

Great explanation

Ответить

Saeed Ataei - 25.02.2022 21:07

great video. thanks. could u plz explain swin transformer too?

Ответить

Tianbao Xie - 06.02.2022 17:45

Very clear, thanks for your work.

Ответить

Ahmed Ismail - 30.01.2022 06:42

The simplest and more interesting explanation, Many Thanks. I am asking about object detection models, did you explain it before?

Ответить

Sudhakar Tummala - 28.01.2022 15:47

Wonderful talk

Ответить

ogsconnect - 24.01.2022 21:11

Good job! Thanks

Ответить

Lion Huang - 21.12.2021 18:44

Very clear, thanks for your work.

Ответить

M E - 01.12.2021 14:15

Really good, thx.

Ответить

Niels Ohlsen - 28.10.2021 01:48

Very nice job, Shusen, thanks!

Ответить

appifyers - 25.10.2021 15:39

Great great great

Ответить

Mona Jalal - 08.10.2021 20:51

This was a great video. Thanks for your time producing great content.

Ответить

Ansh Arora - 07.10.2021 11:13

Great explanation :)

Ответить

PeaceGTV - 03.10.2021 07:15

The concept has similarities to TCP protocol in terms of segmentation and positional encoding. 😅😅😅

Ответить

PeaceGTV - 03.10.2021 07:14

This reminds me of Encarta encyclopedia clips when I was a kid lol! Good job mate!

Ответить

Shams Arfeen - 03.09.2021 21:54

If you remove the positional encoding step, the whole thing is almost equivalent to a CNN, right?
I mean those dense layers are just as filters of a CNN.

Ответить

Derek Chia - 26.08.2021 20:08

Thank you, your video is way underrated. Keep it up!

Ответить

Dung Pham - 09.08.2021 15:40

Amazing video. It helped me to really understand the vision transformers. Thanks a lot. But i have a question why we only use token cls for classifier .

Ответить

Сейчас смотрят