Vision Transformers (ViT) Explained + Fine-tuning in Python

James Briggs

1 год назад

51,595 Просмотров

Скачать видео

Комментарии:

Fabian Altendorfer - 25.09.2023 13:34

You are an inspiration james

Ответить

Aradhya Dhruv - 15.09.2023 12:11

The is by far the best explanation of the paper that I could find. Thanks a lot!

Ответить

Suchira Laknath - 05.08.2023 06:46

This video is really helful. Thank you!

Ответить

Pau Climent Pérez - 20.07.2023 14:05

Well, Bag of Words and Bag of Visual Words WAS a merger of NLP and Computer Vision, back in the day (2010s)

Ответить

Aman Alok - 01.06.2023 20:49

Thanks a lot for this ! Amazing amazing explanation!

Ответить

rigzin Angchuk - 02.05.2023 07:26

hey, thanks a lot. i have come from TensorFlow. so can u please answer, is it training the whole vit model for our dataset or freezing the vit pre trained part and training classification head only (like trainable=false in tf)?

Ответить

Nikolaos Tsarmpopoulos - 17.04.2023 17:53

Very good introductory video. Thanks for sharing.

Ответить

Fidel Odok - 13.04.2023 18:57

Really enjoyed every bit. Trying to setup the transformer for an Audio Regression task, the ViT has shown amazing performance in classification

Ответить

EkShunya - 11.04.2023 20:15

Thank you for the effort ur putting here in your explanations. :)

Ответить

Philipp Lagrange - 09.04.2023 17:03

Great video! I've watched quite a few videos and read papers about Transformers, but your video really made me understand the concept

Ответить

Bits in Motion - 29.03.2023 05:13

no fun using huggingface transformers library. you should have explained vision transformers using a more basic implementation, than a high level library

Ответить

Zappist - 23.03.2023 21:48

James is the top G in deep learning

Ответить

Diego Ligtenberg - 27.02.2023 04:28

it currently gives the error 'no module named 'datasets', anybody has a fix?

Ответить

Y - 11.02.2023 19:54

Are there such thing that is similar to word embeddings? Or you simply take your pixel data as patches and run it through the dense layer to get projections?

Ответить

PS RAJU - 07.02.2023 04:10

thank you sir

Ответить

Conaire Byrne - 30.01.2023 17:41

Great video man cheers! Do you have a video about using a dataset made up of your own images on the vision transformer?

Ответить

Scott Korman - 02.01.2023 18:41

Thanks a lot for the video. I cant find any precise explanation about the function of self-attention layer and MLP layer in the encoder modules. Could you maybe add some information about that?

Ответить

Rockwell Thivierge - 01.01.2023 08:34

Nice one..! This content desperately needs "Promo SM"!

Ответить

leonard vanduuren - 18.12.2022 10:46

Another great video of yours. So clear and clarifying. Thx !

Ответить

Russian guy - 14.12.2022 18:55

Great explanation, unique on YT. Thanks!

Ответить

shaheer zaman - 05.12.2022 11:39

great stuff!

Ответить

Sara - 30.11.2022 20:19

In this video, there is no explanation for the output of a vision transformer. In NLP transformers, the output is a probability distribution over the vocab but in vision transformers, I guess it is over a code book. But what this code book is and how it is aligned to the input image is not clear. Thanks a lot for this video but it is incomplete

Ответить