Adding Nested Loops Makes this Algorithm 120x FASTER?

DepthBuffer

9 месяцев назад

123,575 Просмотров

Скачать видео

Комментарии:

Aee Bee Cee - 09.11.2023 07:11

i just subed this channel

Ответить

AREA CREW BMX - 07.11.2023 21:21

Genius

Ответить

L V - 07.11.2023 01:09

This seems really cool. Good work, you're on the right path.

Ответить

3-Valdion Dreemur - 05.11.2023 17:52

Jesus christ now THIS is wide format.

I get you want more screen real estate, but anyone watching this on a 1080 16/9 aspect ratio will have a hard time reading. Even on my wide format cellphone it's not wide enough. You should really use an aspect ration that's in between the ultra wide, and 16/9 aspect ratio - kinda like how LTT does it. That way neither PC or mobile devices get large blackbars and it's still a lot of screen real estate.

Ответить

Dan H - 05.11.2023 05:45

Awesome video! Thank you so much for sharing this!

Ответить

khatdubell - 03.11.2023 20:32

"proprietary codebase"

And that is the problem.
proprietary codebase.

I'd much rather depend on a 16ms open implementation than a 2ms closed one.

Ответить

anon - 02.11.2023 17:57

where did you get the FLOPS throughput info?

Ответить

FCLC - 02.11.2023 01:39

Very well done! I make a living optimizing BLAS routines, this will probably become my default “what do you do” to send people

Ответить

Malte133 - 01.11.2023 23:11

Great work and its true optimizing the code by splitting up commands in assembler is a pain. saw videos on that too.

Ответить

Colonthree Enterprises - 01.11.2023 01:54

Shoutouts to CinemaScope. :3

Ответить

Lime Tree - 31.10.2023 14:04

This is so beautiful I nearly cried, and at the end when you reviled that the library had a 1000x optimisation I died and came back to life a better person.

Ответить

SamTheFriendlyGuy - 30.10.2023 20:44

Wait. What is your CPU? How do you know how much GFLOPS your CPU have?

Ответить

Vivian Brégier - 30.10.2023 10:24

Please, if you mention MKL, do mention that this is a non-portable intel-only library !
It won't work at all on any non-x86 compatible architecture, such as arm, and has poor performance on AMD cpu.

There are some open-source portable alternatives to MKL (blis, libflame).
Don't trade portability for performance on a single CPU family !

Ответить

Josh Pauline - 30.10.2023 04:27

probably one of the best programming videos I have ever seen, as a more senior developer there is a lack of content on this level of production quality when explaining complex ideas

Ответить

Tansan Astro - 29.10.2023 08:35

Visualizations were freaking amazing! Loved those! could you make a video of your process for editing these videos?

Ответить

Eric GT - 29.10.2023 07:24

this is a well done video and explains the idea of leveraging hardware and machine code to optimize cache lookups super well. But I also want to shout out what may be the best animation of a matrix dot product I’ve ever seen. this feels like the first time I watched a video that got me to understand what monads are

Ответить

Eric Raio - 29.10.2023 04:44

Came here from primeagen! :D

Ответить

Nestor Jaba-an - 29.10.2023 01:56

Very good. I haven't thought about optimization for a long time since I was doing Assembly in the 90's. This brings back memories and the feeling. Very good!

Ответить

M - 28.10.2023 16:30

came here to from prime

Ответить

Hadi Ariakia - 28.10.2023 03:20

Love the video,
Where have you been mate until now?😂

Ответить

O L - 28.10.2023 02:08

Cool videos. Interesting information and great graphics/animations. Thanks!

Ответить

BreadMan - 27.10.2023 23:37

This has so fucking deeply re-ignited my passion for computer science. Gosh I am on fire right now

Ответить

flightman - 27.10.2023 20:56

hats off to the effort and thanks to @primetime for showing us this gem

Ответить

xdevs23 - 27.10.2023 20:33

This is so well explained and the animation are SO good!
Also it would be interesting to see how clang performs compared to gcc.

Ответить

Dali Codes - 27.10.2023 20:22

from Prime

Ответить

frango molhado - 27.10.2023 19:57

man what a great video

Ответить

Markets & Moto - 27.10.2023 19:23

this dude's channel bout to get huge now

Ответить

Royer Adames - 27.10.2023 18:43

I saw your video in the ThePrimeTime, and your video is epic. Very well explain for such high level topics.

Ответить

dirty kebab - 27.10.2023 18:20

I love this. As a Snr.Sweng, looking to learn more about these god tier optimisations, where can I start? 😮😮

Ответить

insuna - 27.10.2023 16:08

great video

Ответить

Czef Czephatus Ch. - 25.10.2023 22:55

I love how in depth this video makes me feel smart. I know that only a few people could make sense of such content like this. But, you make it feel like even more people can get close to it.

Ответить

James Salsman - 24.10.2023 01:03

At this level, C++ is more opaque than heavily macroed assembly.

Ответить

Діма Красько - 22.10.2023 18:30

would it even be possible in higher level languages as javascript and python?

Ответить

Notrum666 - 21.10.2023 05:55

Great videos, you certainly know what you are talking about and you can share it while keeping it interesting, keep it up, you deserve more subscribers

Ответить

Kyrelel - 21.10.2023 05:48

So, "Adding Nested Loops" was not the trick :/

Ответить

matt81093 - 20.10.2023 07:47

Once you reminded me of the context of the last video, I believe i know where this is going

Ответить

Somniad - 17.10.2023 21:21

this video is w i d e

Ответить

Sorry I'm Late - 17.10.2023 10:06

Memory access now is the bottleneck for CPU

Ответить

Sorry I'm Late - 17.10.2023 10:05

Imagine how much underlying low level mechanism was hidden from programmers.

They aren’t supposed to know if they don’t have to optimize to the limit.

Ответить

axe863 - 17.10.2023 02:35

Beautiful.

Ответить

Daniel - 13.10.2023 20:24

Great video!

Ответить

SaltyOwl - 12.10.2023 05:07

Please tell me the next one will only be a month or two. I love learning this level of optimization and it is conveyed so well. I will openly admit I’m being selfish cause I don’t wanna wait XD, though I do understand if that can’t be the case, hardly got a lick of free time with my own courses as well

Ответить

UJJWAL AGGARWAL - 12.10.2023 02:48

just amazing

Ответить

Virel - 10.10.2023 18:08

Its a great videof or optimizing a matrix multiplication from a purely technical/computing science standpoint. But i think improvement you still could make is to multiply the matrices with something like the Strassen algorithmus ( nowdays there even faster algorithm), but a matrix multiplication doesnt need to have a cubic ^3 runtime. You can actually do matrix multiplications in slightly less for example ^2.8, which should give significant improvements when multiplying big matrices

Ответить