Adding Nested Loops Makes this Algorithm 120x FASTER?

Adding Nested Loops Makes this Algorithm 120x FASTER?

DepthBuffer

9 месяцев назад

123,575 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Aee Bee Cee
Aee Bee Cee - 09.11.2023 07:11

i just subed this channel

Ответить
AREA CREW BMX
AREA CREW BMX - 07.11.2023 21:21

Genius

Ответить
L V
L V - 07.11.2023 01:09

This seems really cool. Good work, you're on the right path.

Ответить
3-Valdion Dreemur
3-Valdion Dreemur - 05.11.2023 17:52

Jesus christ now THIS is wide format.

I get you want more screen real estate, but anyone watching this on a 1080 16/9 aspect ratio will have a hard time reading. Even on my wide format cellphone it's not wide enough. You should really use an aspect ration that's in between the ultra wide, and 16/9 aspect ratio - kinda like how LTT does it. That way neither PC or mobile devices get large blackbars and it's still a lot of screen real estate.

Ответить
Dan H
Dan H - 05.11.2023 05:45

Awesome video! Thank you so much for sharing this!

Ответить
khatdubell
khatdubell - 03.11.2023 20:32

"proprietary codebase"

And that is the problem.
proprietary codebase.

I'd much rather depend on a 16ms open implementation than a 2ms closed one.

Ответить
anon
anon - 02.11.2023 17:57

where did you get the FLOPS throughput info?

Ответить
FCLC
FCLC - 02.11.2023 01:39

Very well done! I make a living optimizing BLAS routines, this will probably become my default “what do you do” to send people

Ответить
Malte133
Malte133 - 01.11.2023 23:11

Great work and its true optimizing the code by splitting up commands in assembler is a pain. saw videos on that too.

Ответить
Colonthree Enterprises
Colonthree Enterprises - 01.11.2023 01:54

Shoutouts to CinemaScope. :3

Ответить
Lime Tree
Lime Tree - 31.10.2023 14:04

This is so beautiful I nearly cried, and at the end when you reviled that the library had a 1000x optimisation I died and came back to life a better person.

Ответить
SamTheFriendlyGuy
SamTheFriendlyGuy - 30.10.2023 20:44

Wait. What is your CPU? How do you know how much GFLOPS your CPU have?

Ответить
Vivian Brégier
Vivian Brégier - 30.10.2023 10:24

Please, if you mention MKL, do mention that this is a non-portable intel-only library !
It won't work at all on any non-x86 compatible architecture, such as arm, and has poor performance on AMD cpu.

There are some open-source portable alternatives to MKL (blis, libflame).
Don't trade portability for performance on a single CPU family !

Ответить
Josh Pauline
Josh Pauline - 30.10.2023 04:27

probably one of the best programming videos I have ever seen, as a more senior developer there is a lack of content on this level of production quality when explaining complex ideas

Ответить
Tansan Astro
Tansan Astro - 29.10.2023 08:35

Visualizations were freaking amazing! Loved those! could you make a video of your process for editing these videos?

Ответить
Eric GT
Eric GT - 29.10.2023 07:24

this is a well done video and explains the idea of leveraging hardware and machine code to optimize cache lookups super well. But I also want to shout out what may be the best animation of a matrix dot product I’ve ever seen. this feels like the first time I watched a video that got me to understand what monads are

Ответить
Eric Raio
Eric Raio - 29.10.2023 04:44

Came here from primeagen! :D

Ответить
Nestor Jaba-an
Nestor Jaba-an - 29.10.2023 01:56

Very good. I haven't thought about optimization for a long time since I was doing Assembly in the 90's. This brings back memories and the feeling. Very good!

Ответить
M
M - 28.10.2023 16:30

came here to from prime

Ответить
Hadi Ariakia
Hadi Ariakia - 28.10.2023 03:20

Love the video,
Where have you been mate until now?😂

Ответить
O L
O L - 28.10.2023 02:08

Cool videos. Interesting information and great graphics/animations. Thanks!

Ответить
BreadMan
BreadMan - 27.10.2023 23:37

This has so fucking deeply re-ignited my passion for computer science. Gosh I am on fire right now

Ответить
flightman
flightman - 27.10.2023 20:56

hats off to the effort and thanks to @primetime for showing us this gem

Ответить
xdevs23
xdevs23 - 27.10.2023 20:33

This is so well explained and the animation are SO good!
Also it would be interesting to see how clang performs compared to gcc.

Ответить
Dali Codes
Dali Codes - 27.10.2023 20:22

from Prime

Ответить
frango molhado
frango molhado - 27.10.2023 19:57

man what a great video

Ответить
Markets & Moto
Markets & Moto - 27.10.2023 19:23

this dude's channel bout to get huge now

Ответить
Royer Adames
Royer Adames - 27.10.2023 18:43

I saw your video in the ThePrimeTime, and your video is epic. Very well explain for such high level topics.

Ответить
dirty kebab
dirty kebab - 27.10.2023 18:20

I love this. As a Snr.Sweng, looking to learn more about these god tier optimisations, where can I start? 😮😮

Ответить
insuna
insuna - 27.10.2023 16:08

great video

Ответить
Czef Czephatus Ch.
Czef Czephatus Ch. - 25.10.2023 22:55

I love how in depth this video makes me feel smart. I know that only a few people could make sense of such content like this. But, you make it feel like even more people can get close to it.

Ответить
James Salsman
James Salsman - 24.10.2023 01:03

At this level, C++ is more opaque than heavily macroed assembly.

Ответить
Діма Красько
Діма Красько - 22.10.2023 18:30

would it even be possible in higher level languages as javascript and python?

Ответить
Notrum666
Notrum666 - 21.10.2023 05:55

Great videos, you certainly know what you are talking about and you can share it while keeping it interesting, keep it up, you deserve more subscribers

Ответить
Kyrelel
Kyrelel - 21.10.2023 05:48

So, "Adding Nested Loops" was not the trick :/

Ответить
matt81093
matt81093 - 20.10.2023 07:47

Once you reminded me of the context of the last video, I believe i know where this is going

Ответить
Somniad
Somniad - 17.10.2023 21:21

this video is w i d e

Ответить
Sorry I'm Late
Sorry I'm Late - 17.10.2023 10:06

Memory access now is the bottleneck for CPU

Ответить
Sorry I'm Late
Sorry I'm Late - 17.10.2023 10:05

Imagine how much underlying low level mechanism was hidden from programmers.

They aren’t supposed to know if they don’t have to optimize to the limit.

Ответить
axe863
axe863 - 17.10.2023 02:35

Beautiful.

Ответить
Daniel
Daniel - 13.10.2023 20:24

Great video!

Ответить
SaltyOwl
SaltyOwl - 12.10.2023 05:07

Please tell me the next one will only be a month or two. I love learning this level of optimization and it is conveyed so well. I will openly admit I’m being selfish cause I don’t wanna wait XD, though I do understand if that can’t be the case, hardly got a lick of free time with my own courses as well

Ответить
UJJWAL AGGARWAL
UJJWAL AGGARWAL - 12.10.2023 02:48

just amazing

Ответить
Virel
Virel - 10.10.2023 18:08

Its a great videof or optimizing a matrix multiplication from a purely technical/computing science standpoint. But i think improvement you still could make is to multiply the matrices with something like the Strassen algorithmus ( nowdays there even faster algorithm), but a matrix multiplication doesnt need to have a cubic ^3 runtime. You can actually do matrix multiplications in slightly less for example ^2.8, which should give significant improvements when multiplying big matrices

Ответить
djtomoy
djtomoy - 10.10.2023 09:40

Huh? 😐

Ответить
Amad Zarak
Amad Zarak - 09.10.2023 19:39

GEMM is the perfect example to demonstrate these concepts. Wonderful video my friend. You earned a subscriber

Ответить