Комментарии:
i just subed this channel
ОтветитьGenius
ОтветитьThis seems really cool. Good work, you're on the right path.
ОтветитьJesus christ now THIS is wide format.
I get you want more screen real estate, but anyone watching this on a 1080 16/9 aspect ratio will have a hard time reading. Even on my wide format cellphone it's not wide enough. You should really use an aspect ration that's in between the ultra wide, and 16/9 aspect ratio - kinda like how LTT does it. That way neither PC or mobile devices get large blackbars and it's still a lot of screen real estate.
Awesome video! Thank you so much for sharing this!
Ответить"proprietary codebase"
And that is the problem.
proprietary codebase.
I'd much rather depend on a 16ms open implementation than a 2ms closed one.
where did you get the FLOPS throughput info?
ОтветитьVery well done! I make a living optimizing BLAS routines, this will probably become my default “what do you do” to send people
ОтветитьGreat work and its true optimizing the code by splitting up commands in assembler is a pain. saw videos on that too.
ОтветитьShoutouts to CinemaScope. :3
ОтветитьThis is so beautiful I nearly cried, and at the end when you reviled that the library had a 1000x optimisation I died and came back to life a better person.
ОтветитьWait. What is your CPU? How do you know how much GFLOPS your CPU have?
ОтветитьPlease, if you mention MKL, do mention that this is a non-portable intel-only library !
It won't work at all on any non-x86 compatible architecture, such as arm, and has poor performance on AMD cpu.
There are some open-source portable alternatives to MKL (blis, libflame).
Don't trade portability for performance on a single CPU family !
probably one of the best programming videos I have ever seen, as a more senior developer there is a lack of content on this level of production quality when explaining complex ideas
ОтветитьVisualizations were freaking amazing! Loved those! could you make a video of your process for editing these videos?
Ответитьthis is a well done video and explains the idea of leveraging hardware and machine code to optimize cache lookups super well. But I also want to shout out what may be the best animation of a matrix dot product I’ve ever seen. this feels like the first time I watched a video that got me to understand what monads are
ОтветитьCame here from primeagen! :D
ОтветитьVery good. I haven't thought about optimization for a long time since I was doing Assembly in the 90's. This brings back memories and the feeling. Very good!
Ответитьcame here to from prime
ОтветитьLove the video,
Where have you been mate until now?😂
Cool videos. Interesting information and great graphics/animations. Thanks!
ОтветитьThis has so fucking deeply re-ignited my passion for computer science. Gosh I am on fire right now
Ответитьhats off to the effort and thanks to @primetime for showing us this gem
ОтветитьThis is so well explained and the animation are SO good!
Also it would be interesting to see how clang performs compared to gcc.
from Prime
Ответитьman what a great video
Ответитьthis dude's channel bout to get huge now
ОтветитьI saw your video in the ThePrimeTime, and your video is epic. Very well explain for such high level topics.
ОтветитьI love this. As a Snr.Sweng, looking to learn more about these god tier optimisations, where can I start? 😮😮
Ответитьgreat video
ОтветитьI love how in depth this video makes me feel smart. I know that only a few people could make sense of such content like this. But, you make it feel like even more people can get close to it.
ОтветитьAt this level, C++ is more opaque than heavily macroed assembly.
Ответитьwould it even be possible in higher level languages as javascript and python?
ОтветитьGreat videos, you certainly know what you are talking about and you can share it while keeping it interesting, keep it up, you deserve more subscribers
ОтветитьSo, "Adding Nested Loops" was not the trick :/
ОтветитьOnce you reminded me of the context of the last video, I believe i know where this is going
Ответитьthis video is w i d e
ОтветитьMemory access now is the bottleneck for CPU
ОтветитьImagine how much underlying low level mechanism was hidden from programmers.
They aren’t supposed to know if they don’t have to optimize to the limit.
Beautiful.
ОтветитьGreat video!
ОтветитьPlease tell me the next one will only be a month or two. I love learning this level of optimization and it is conveyed so well. I will openly admit I’m being selfish cause I don’t wanna wait XD, though I do understand if that can’t be the case, hardly got a lick of free time with my own courses as well
Ответитьjust amazing
ОтветитьIts a great videof or optimizing a matrix multiplication from a purely technical/computing science standpoint. But i think improvement you still could make is to multiply the matrices with something like the Strassen algorithmus ( nowdays there even faster algorithm), but a matrix multiplication doesnt need to have a cubic ^3 runtime. You can actually do matrix multiplications in slightly less for example ^2.8, which should give significant improvements when multiplying big matrices
ОтветитьHuh? 😐
ОтветитьGEMM is the perfect example to demonstrate these concepts. Wonderful video my friend. You earned a subscriber
Ответить