K-means Cluster Analysis With Excel - A Tutorial

3 года назад

51,946 Просмотров

Комментарии:

Mika Stamaria - 31.10.2023 17:20

Hello! May I ask how the end results can be put into a graph for visualization to show the actual result of the clustering? Trying to search online but I can't seem to find one that aligns to this method.

Very informative tutorial on the groundwork though to understand the foundations. :)

Ответить

Jose Olivar - 21.08.2023 23:49

Hello, great video! Do you have a video showing how you would do this using R?

Ответить

Paul Acito - 22.06.2023 20:53

Just fantastic explanations and approach. Thanks for sharing this.

Ответить

Kris Simpson - 17.05.2023 20:46

You mentioned that for categorical data there are better alternatives for cluster analysis - what would you recommend please?

Ответить

RAINA SHRIVASTAVA - 16.05.2023 15:28

Hey, I have a media data set in which, Rows are episode names and Columns are the different slot timings, So Let's say episode A has data for only 3 slots, and Episode B has data for 4 slots and so on. How do I apply K means to this Data set?

Ответить

Hussein Adam - 24.04.2023 05:05

You gained a new subscriber now thanks Dave

Ответить

Pedro Serpa - 10.03.2023 15:42

Great Video.

Suggestion1
If in the iteration 1 you calculate de sum of the minimum distances. After that, you use the "excel->Data->Solver" to find your minimum of that sum by changing your initial points. With that, excel will do all the work for you in a glance.

Ответить

Festus David Oundo - 05.01.2023 14:05

Thanks Dave, Got it.

Ответить

Topfundus - 21.12.2022 05:31

Prima, vielen Dank! Konnte nun mein eigenes Programm mit VB zur flexiblen Clusterbildung beliebiger Wertepaare schreiben. Die Daten (x, y) übernehme ich zunächst aus der Tabelle in ein Array. Dieses lasse ich dann mit verschachtelten Schleifen n-mal (=Iterationen) durchlaufen, bis alle Wertepaare auf Basis der kürzesten Entfernungen einem Cluster(-Punkt) zugeordnet sind, ohne das weitere Schleifendurchläufe diese Zuordnungen verändern. Es erfolgt der Ausstieg aus den Schleifen. Die Clusternummern werden dann auf einen Schlag in die Quelltabelle, in eine neue Spalte eingefügt. Fertig! Das geht alles blitzschnell und ohne die vielen, doch ziemlich aufwendigen Tabellen und Formeln, die Du im zweiten Teil Deines Tutorials zeigst. Dazu kommt, dass ich mein k-Mean-Programm universell nutzen kann. Es ist gleich, welche (numerischen) Datenspalten einlesen und clustern kann.

Great. Thanks! I was now able to write my own program with VB for flexible clustering of any pairs of values. First, I take the data (x, y) from the table into an array. I then run this through nested loops n times (=iterations) until all pairs of values are assigned to a cluster (point) based on the shortest distances, without further loop runs changing these assignments. There is an exit from the loops. The cluster numbers are then inserted in one fell swoop into a new column in the source table. Finished! It's all lightning fast and without the many, but rather complex tables and formulas that you show in the second part of your tutorial. In addition, I can use my k-mean program universally. It doesn't matter which (numeric) data columns can read and cluster.

Ответить

Alpay Dincer - 07.09.2022 20:13

You're my Hero. God bless you 🤝

Ответить

Alexander König - 27.08.2022 22:53

Hi David, great video!!! today there are several new formulas, xlookup is an amazing and much easier way than vlookup. and for you huge and monster formula use the SUMXMY2 function. To find the average of each cluster (top rows) use the averageif formula... muuuuch easier and skip the power query step. Finnaly you shouldt run a min solver in order to find the min distance among variables and centroids.

Ответить

nagendra vishwamitra - 27.07.2022 17:42

One of the best lecture on k-means ..if you were in front of me i would have kissed you ..Greetings from India

Ответить

dataanalyst101 - 04.05.2022 10:50

In k means clustering, is there an assumption in numbers of observations and variables? Would having variables greater than observation affect the results of clustering and make it less accurate?

Ответить

Brooks Tomblin, Jr. - 20.04.2022 03:43

This was an awesome video!

Ответить

Anton Zuev - 19.04.2022 17:39

Hello! Thanks for the detailed explanation. But, do you have the same, but fully based on PowerQuery?
Initially, I want to try the same, but with ~2 million (lines) of customers, and >50 columns (dimensions). I think VLOOKUping them is not an optimal way to do that (
Or, my best alternative is to switch to R / Python with these volumes?

Ответить

swadmin studynwork - 26.03.2022 11:59

Thanks!!! I learned a lot with your video :)

Ответить

Williams tan - 24.02.2022 09:23

David

Thanks for the clip, very useful and informative.

May I suggest to have one for Mixed Data (Category and numeric data? I have been searching it for ages but in vain.
Thanks for your help

Ответить

Caitlin Thompson - 18.02.2022 21:02

This is such a great tutorial. I've been trying to do this all week (started in python but came back to Excel) and this is exactly what I needed. Really thorough, lots to think about and still easy to follow!

Ответить