System Design: Why is Kafka fast?

System Design: Why is Kafka fast?

ByteByteGo

1 год назад

1,079,064 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

ByteByteGo
ByteByteGo - 15.07.2022 01:59

Subscribe and Kafka will say thank you :)

Ответить
Sawyer Burnett
Sawyer Burnett - 26.09.2023 18:19

totally lost me at the no-copy discussion :(

Ответить
oefzdegoeggl
oefzdegoeggl - 26.09.2023 12:08

well. good video, but this is nothing new, the sendfile() syscall is around for a long time any many people used it long before kafka. neither is DMA new, it was there since the very first IBM PC machines.

Ответить
睡不醒的小麦
睡不醒的小麦 - 26.09.2023 10:02

Useful information.

Ответить
Jianhua Yan
Jianhua Yan - 22.09.2023 20:49

Each kafka broker could have data from multiple topic partitions. Reading data from same kafka broker still needs to swich disk header. Will that hurt the benefit from sequential IO?

Ответить
Victor Hazbun
Victor Hazbun - 22.09.2023 18:47

More videos like this! ❤

Ответить
Turkhan Badalov
Turkhan Badalov - 03.09.2023 15:25

Thank you! Such a great delivery and explanation. Particularly, great choice of aspects to share.

Ответить
Nhan Le
Nhan Le - 03.09.2023 14:09

Thank you so much

Ответить
smoideen
smoideen - 25.08.2023 10:36

This was a clear and concise presentation. Thank you so much 👍

Ответить
kareka009
kareka009 - 15.08.2023 18:27

Great videos...will be even better if this was a tad slower

Ответить
Kiters Refuge
Kiters Refuge - 13.08.2023 19:41

Thanks you! Excellent.

Ответить
Dot Dager
Dot Dager - 09.08.2023 16:29

First time I actually WANT to subscribe to a newsletter.

Ответить
mehdi
mehdi - 05.08.2023 17:41

useful <3

Ответить
GreyDeathVaccine
GreyDeathVaccine - 31.07.2023 13:53

Superb content

Ответить
kiwi
kiwi - 30.07.2023 11:09

Uhh I thought this was a final fantasy video

Ответить
sultown
sultown - 23.07.2023 21:13

is there a risk of Kafka accessing other areas of the memory cache in which system calls could send wrong/private data?

Ответить
Luiz Adolphs
Luiz Adolphs - 22.07.2023 18:46

Awesome video!!!!!! How those animations are made? In after effects??

Ответить
Kshitij Mali
Kshitij Mali - 19.07.2023 17:41

Hi sir SRIO is again more faster than DMA

Ответить
Llλmbd̰ṵh̰
Llλmbd̰ṵh̰ - 18.07.2023 10:44

While sequential access can be efficient for certain tasks, it also has several downsides:

Slow Access for Individual Records: If you need to access a specific record in the middle or at the end of a sequentially accessed file or data structure, you would have to traverse through all preceding records. This can be very inefficient and time-consuming, particularly for large datasets.

Inefficient Updates and Deletions: If a record in a sequentially accessed file needs to be updated or deleted, you often have to rewrite the entire file, or at least all the data following that record, which can be very slow and inefficient.

Inefficient for Concurrent Access: In situations where multiple users or processes need to access data concurrently, sequential access can be very inefficient and may even lead to data corruption if not handled correctly.

Lack of Flexibility: Sequential access doesn't allow for as much flexibility in terms of data access patterns. You are essentially restricted to accessing data in the order it was written.

Space Inefficiency: Sequential files can become space inefficient over time. If records are deleted, the space they occupied often cannot be reused, leading to wasted space.

Data Structure Overhead: In certain data structures optimized for sequential access, such as linked lists, there can be significant overhead in terms of additional pointers or other structural information that needs to be stored along with the actual data.

Sequential access is particularly useful and efficient in certain scenarios, including:

Data Streaming: When data is being streamed from one point to another, such as in audio or video streaming services, sequential access is ideal. Data is read in the order it arrives, and there's usually no need to skip forward or backward.

Log Files: Log files are typically written and read in a sequential manner. The most recent events are appended to the end of the log, and when reviewing the logs, it's often most useful to read events in the order they occurred.

Backup and Restore Operations: When performing backup operations or restoring data from backups, the data can be processed sequentially. The backup process involves reading all data from a source and writing it to a backup medium, while restore operations read the data from the backup medium and write it back to the source or a new location.

Batch Processing: In scenarios where large volumes of data need to be processed in one go, such as overnight processing of transactions, sequential access can be used efficiently.

Data Warehousing and Data Mining: In data warehousing and mining operations where huge volumes of data are processed, sequential access is often used.

Sequential Read/Write Media: For certain types of media, such as magnetic tapes, sequential access is the only viable method. You read from or write to the tape in a linear fashion, from one end to the other.


Zero copy is a technique that reduces CPU usage and increases data processing speed by eliminating unnecessary data copying between user space and kernel space during network communication or file I/O operations. The data to be sent over the network is sent directly from the disk buffer cache to the network buffer without being copied.

Pros:

Increased Efficiency: Zero-copy can significantly speed up data transfer rates because it removes the overhead of copying data between user and kernel space.

Reduced CPU Usage: As there's no need to copy data, zero-copy methods can reduce CPU usage, freeing up resources for other tasks.

Reduced Memory Usage: Zero-copy techniques can lead to less memory usage because they avoid creating extra copies of data in memory.

Lower Latency: By avoiding the overhead of data copying, zero-copy can lead to lower latency in network communication or file I/O operations.

Cons:

Complexity: Implementing zero-copy can be complex and may require a deep understanding of the operating system and network interfaces. This can increase development time and potentially introduce more bugs.

Data Security: With zero-copy, the data stays in the kernel buffer and is directly accessible to user space. This could potentially lead to security vulnerabilities if not managed correctly.

Buffer Availability: Zero-copy can lead to buffers being locked for longer periods, as the same buffer is used for reading data from the disk and sending it over the network. This could potentially impact other tasks that need to use these buffers.

Non-Contiguous Memory Issues: If data is stored non-contiguously in memory, zero-copy can be challenging to implement effectively.

The decision to use zero-copy would largely depend on the specific needs of the system and whether the benefits of increased data transfer speed, reduced CPU usage, and lower memory footprint outweigh the increased complexity and potential risks.

Ответить
Arpit Jain
Arpit Jain - 02.07.2023 09:46

Amazing Insite. Zero Copy was new to me 😊

Ответить
Sandeep Raj Betanapalli
Sandeep Raj Betanapalli - 02.07.2023 07:01

Very well explained. Thank you.

Ответить
MSH
MSH - 18.06.2023 09:17

Thank u so much!!! I had this question in my mind and got explained by your in a very easy way!!!

Ответить
f
f - 14.06.2023 17:43

kafaka is not fast man.

Ответить
Absolutiona
Absolutiona - 10.06.2023 00:55

🙄 NATS and RabbitMQ have high speed benchmark which is already available in the market.

Ответить
Chukwudozie Adigwe
Chukwudozie Adigwe - 09.06.2023 08:02

Very insightful. The diagrams made me understand the concepts

Ответить
Eren Tasdemir
Eren Tasdemir - 05.06.2023 14:53

The video title should be: How to answer a complex question in 5 mins.

Ответить
yunrui li
yunrui li - 29.05.2023 21:10

Is HDD better than SSD ah ?

Ответить
Dimitrios Pantelakis
Dimitrios Pantelakis - 25.05.2023 09:32

x1.25 speed, you’re welcome

Ответить
Viral vlogs
Viral vlogs - 23.05.2023 21:23

Hey byte, Could you please tell me what your using for presentation

Ответить
I N Saikishore Seelamsetty
I N Saikishore Seelamsetty - 18.05.2023 07:12

Exemplary Illustration on Kafka

Ответить
Adriano Souza
Adriano Souza - 16.05.2023 16:04

Thanks for sharing!

Ответить
primary_channel
primary_channel - 15.05.2023 04:37

Really great

Ответить
Venu Koka
Venu Koka - 15.05.2023 03:30

Very elegantly done. I wonder what animation tool they are using ??

Ответить
Ahmed Amnk
Ahmed Amnk - 13.05.2023 21:12

Once of the best channel , i came to know you from linkedin 😅

Ответить
Durgesh Kshirsagar
Durgesh Kshirsagar - 13.05.2023 16:35

Kafka is not fast. I just dropped one poc.

Ответить
Scott Neibarger
Scott Neibarger - 06.05.2023 02:07

Your channel is excellent.

Ответить
Avastar_Bin
Avastar_Bin - 02.05.2023 18:11

If that improves the performances, why are all the other dB engines not using this?

Ответить
Prathiba Vijayasekaran
Prathiba Vijayasekaran - 26.04.2023 06:03

Very simple with good animation to explain things clearly. Keep publishing these kinds of useful videos.

Ответить
Does Not Exist will Create in Future
Does Not Exist will Create in Future - 25.04.2023 19:33

Awesome tutorial

Ответить
J C
J C - 24.04.2023 11:17

Doesn't the kernel's memory management resolve the physical addresses to random free space on disk even if the logical address is sequential? Can anyone correct me if I'm wrong?

Ответить
Elton Melo
Elton Melo - 22.04.2023 13:59

Hi Alex, what software do you use to create those animations? Thanks!

Ответить
thalathoti tharun prabhakar
thalathoti tharun prabhakar - 22.04.2023 09:05

Thank you for the wonderful explanation of Kafkas abilities.

Ответить
Deepak Poojari
Deepak Poojari - 18.04.2023 11:49

toooo good whoever has made it. 👍

Ответить
Sharat Chandra
Sharat Chandra - 15.04.2023 03:45

If Kafka uses direct memory access to get data to the network buffer, what system acts as the memory master to decide who gets access to the OS buffer at what time?

Ответить
Rocky Stallion
Rocky Stallion - 14.04.2023 23:50

Your explanation is lucid and to the point. Thanks for the video. Keep up the good work! Wish you the best of luck.

Ответить
Lcch
Lcch - 09.04.2023 19:03

Amazing work guys! I'm subscribed to any newsletter and video you make, and it's worth it. Congratulations team 👏👏👏

Ответить
Vivek Sharma
Vivek Sharma - 08.04.2023 09:25

Simplicity the best. Your USP is to focus on Contextual knowledge with byte size videos with great graphics. Very valuable.

Ответить
Paleoanthropologist
Paleoanthropologist - 01.04.2023 11:33

wocao

Ответить
Charlotte Dsouza
Charlotte Dsouza - 17.03.2023 17:17

Amazing explanation!

Ответить