Комментарии:
Excellent explanation 👌
ОтветитьYou have explained ViT in simple words. Thanks
ОтветитьNice video!!, Just a question what is the argue behind to rid of the vectors c1 to cn, and just remain with c0? Thanks
ОтветитьThis is a werid question but what if we made the transformer try to learn the positional encoding by itself? We give it everything but the positional encoding and we tell it to find the positional encoding. What would it learn and how would that end up helping us? I dont know if this thought will go anywhere I wanted to throw it out here.
ОтветитьNicely explained. Appreciate your efforts.
ОтветитьThank you for the clear explanation!!☺
ОтветитьThank you. Best ViT video I found.
ОтветитьIn the job market, do data scientists use transformers?
Ответитьamazing precise explanation
Ответитьgood video ,what a splendid presentation , wang shusen yyds.
Ответить👏
ОтветитьThank you for your Attention Models playlist. Well explained.
ОтветитьMan, you made my day! These lectures were golden. I hope you continue to make more of these
Ответитьgreat
ОтветитьThese are some of the best, hands-on and simple explanations I've seen in a while on a new CS method. Straight to the point with no superfluous details, and at a pace that let me consider and visualize each step in my mind without having to constantly pause or rewind the video. Thanks a lot for your amazing work! :)
ОтветитьCan't stress enough on how easy to understand you made it
ОтветитьAwesome Explanation.
Thank you
This is a great explanation video.
One nit : you are misusing the term 'dimension'. If a classification vector is linear with 8 values, that's not '8-dimensional' -- it is a 1-dimensional vector with 8 values.
Brilliant. Thanks a million
ОтветитьWHY is the transformer requiring so many images to train?? and why is resnet not becoming better with ore training vs ViT?
ОтветитьIf we ignore output c1 ... cn, what c1 ... cn represent then?
ОтветитьVery good explanation
subscribed!
这英语也是醉了
Ответить其实我觉得up主说中文更好🥰🤣
Ответить15 minutes of heaven 🌿. Thanks a lot understood clearly!
ОтветитьThank you so much for this amazing presentation. You have a very clear explanation, I have learnt so much. I will definitely watch your Attention models playlist.
Ответитьreally great explaination , thankyou
Ответитьgreat video!
ОтветитьNot All Heroes Wear Capes <3
ОтветитьBrilliant explanation, thank you.
ОтветитьAmazing, I am in a rush to implement vision transformer as an assignement, and this saved me so much time !
ОтветитьGreat explanation
Ответитьgreat video. thanks. could u plz explain swin transformer too?
ОтветитьVery clear, thanks for your work.
ОтветитьThe simplest and more interesting explanation, Many Thanks. I am asking about object detection models, did you explain it before?
ОтветитьWonderful talk
ОтветитьGood job! Thanks
ОтветитьVery clear, thanks for your work.
ОтветитьReally good, thx.
ОтветитьVery nice job, Shusen, thanks!
ОтветитьGreat great great
ОтветитьThis was a great video. Thanks for your time producing great content.
ОтветитьGreat explanation :)
ОтветитьThe concept has similarities to TCP protocol in terms of segmentation and positional encoding. 😅😅😅
ОтветитьThis reminds me of Encarta encyclopedia clips when I was a kid lol! Good job mate!
ОтветитьIf you remove the positional encoding step, the whole thing is almost equivalent to a CNN, right?
I mean those dense layers are just as filters of a CNN.
Thank you, your video is way underrated. Keep it up!
ОтветитьAmazing video. It helped me to really understand the vision transformers. Thanks a lot. But i have a question why we only use token cls for classifier .
Ответить