Principle Component Analysis (PCA) using sklearn and python

Principle Component Analysis (PCA) using sklearn and python

Krish Naik

6 лет назад

214,112 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@nukestrom5719
@nukestrom5719 - 05.11.2023 23:03

Well explained it with an easy to understand example. Thanks

Ответить
@md.faysal2318
@md.faysal2318 - 21.10.2023 18:34

I have my own data with some column of questionnaire, so what will be my column name there on the code
for instance, you put columns = cancer[feature_name], what I will put there on my own data?
all the column name one by one?
df-pd.DataFrame( cancer data' columns-cancer ['feature _nanes")

Ответить
@praveenpandey4804
@praveenpandey4804 - 20.10.2023 18:10

really sir, thanks for the knowledge. it helped me to solve your assignment in machine learning segment from PW skill..

Ответить
@pavankumarjammala9262
@pavankumarjammala9262 - 08.10.2023 22:47

Actually !! I took a PCA on digit recognition data there I have took n_component value as 2 but in the visualization it coming multiple colors after executing. can anyone say what will be solution for that ?

Ответить
@historyofislam7509
@historyofislam7509 - 10.08.2023 07:32

Nice Video

Ответить
@kasoziabudusalaamu50
@kasoziabudusalaamu50 - 02.08.2023 16:11

so much insightful. The concept is well understood.

Ответить
@saliherenyuceturk2398
@saliherenyuceturk2398 - 22.05.2023 13:46

amazing
simple and straightforward

Ответить
@jadhavashatai8845
@jadhavashatai8845 - 18.12.2022 22:19

So nice

Ответить
@tanumaterials9685
@tanumaterials9685 - 28.10.2022 19:00

HELLO can you please make video on PCA use along with clustering and then explain the PCA values obtained in clusters

Ответить
@devinshah234
@devinshah234 - 17.10.2022 10:31

can you please upload the data set

Ответить
@katienefoasoro1132
@katienefoasoro1132 - 26.09.2022 01:54

Line 2 its a library that help you to import the dataset?

Ответить
@debatradas1597
@debatradas1597 - 23.09.2022 21:09

thanks

Ответить
@bommubhavana8794
@bommubhavana8794 - 26.08.2022 22:52

Hello, I have newly started working on a PCR project. I am stuck at a point and could really use some help...asap
Thanks a lot in advance.

I am working on python. So we have created PCA instance using PCA(0.85) and transformed the input data.

We have run a regression on principal components explaining 85 percent variance(Say N components). Now we have a regression equation in terms of N PCs. We have taken this equation and tried to express it in terms of original variables.

Now, In order to QC the coefficients in terms of original variables, we tried to take the N components(85% variance) and derived the new data back from this, and applied regression on this data hoping that this should give the same coefficients and intercept as in the above derived regression equation.

The issue here is that the coefficients are not matching when we take N components but when we take all the components the coefficients and intercept are matching exactly.

Also, R squared value and the predictions provided by these two equations are exactly same even if the coefficients are not matching

I am soo confused right now as to why this is happening. I might be missing out on the concept of PCA at some point. Any help is greatly appreciated.Thank you!

Ответить
@sherin7444
@sherin7444 - 02.08.2022 10:18

from sklearn.decompostition import PCA
pca=PCA()
pc=pca.fit_transform(df)
plt.figure()
plt.plot(np.cumsum(pca.explained_varience_ratio))
plt.xlabel('Column')
plt ylabel('EVR')
plt.show()

Ответить
@p15rajan
@p15rajan - 25.07.2022 08:26

Excellent.. Appreciate it. .. liked your video

Ответить
@nithishh2384
@nithishh2384 - 28.06.2022 09:58

Literally, I have searched and seen many videos , but this one has the best explanation

Ответить
@borasimsek2613
@borasimsek2613 - 21.06.2022 19:36

Hi , in this scenerio we had 2 outputs, what happens when the number of outcome increases. For my case, I have 4 output

Ответить
@tehreemqasim2204
@tehreemqasim2204 - 17.06.2022 16:29

Your video is very helpful. God bless you brother

Ответить
@nerdymath6
@nerdymath6 - 08.03.2022 14:03

can we get to know what dimensions have been reduced and what 2 left there?
how we will infer from the graph after applying pca

Ответить
@devashishrathod3462
@devashishrathod3462 - 08.12.2021 07:41

how can we find the variance between the 2 components that the code is reduced to??

Ответить
@sivaramramkrishna5627
@sivaramramkrishna5627 - 30.11.2021 17:33

i feel so good by seeing this ..thanks bro ...you help me out little bit ...make more videos on this type..

Ответить
@phiriavulundiah9249
@phiriavulundiah9249 - 26.10.2021 20:13

A very insightful video

Ответить
@surajshah5630
@surajshah5630 - 15.09.2021 05:06

great effort. thankyou!

Ответить
@chaitanyatuckley4666
@chaitanyatuckley4666 - 17.08.2021 08:08

Thanks a lot Krish

Ответить
@AmirAli-id9rq
@AmirAli-id9rq - 12.08.2021 13:27

A lot of people in comment asked about intuition of PCA. So here it is
.We plot samples using the given features. for example imagine plotting different students (samples) on 3D graph (features English literature marks, Math marks and English Language Marks, x axis English literature marks ,y English Language and z axis Math ). Intuitively Someone who is good in English Literature must be good in English Language , so if I ask u to consider only two dimensions(features) for any classification model ,you will consider Maths and either of English, bcz we know by experience the variation in both English subjects would be less. Thus in PCA we actually project the samples (students in our example) in n numbers of PCA axis and choose the PCA which explains the maximum variation in data.
If we add variation of all PCAs it will be 1 or 100%.
Thus, instead of using all three subject marks I would rather use PC1 and PC2 as my features.

For PCA follow the steps
1. Once u have these 3d plot ready we calculate PC1 , which is a best fitting line that passes through the origin
2. Calculate Slope of PC1
3. calculate the eigen vector for the best fitting line
4.Find PC2 i.e. is a line perpendicular to PC1 and passes through the origin
5. Now rotate the graph such that PC1 is x axis and Pc2 is y axis , and project ur samples

Its kind tough to imagine ,Do read out more. Hope this helps

Ответить
@AmirAli-id9rq
@AmirAli-id9rq - 12.08.2021 11:58

Great Video . At the end of the video u said lost of data ,I guess Its not 100 percent correct to phrase that its not "loss of data:, its actually the essence or rather the info of the data is not lost rather its converged into two dimensions

Ответить
@rachitsingh4913
@rachitsingh4913 - 21.07.2021 18:45

How to know that how much data wo lost on decreasing dimensionality and how many components are best ??

Ответить
@RohitGupta-ox6tn
@RohitGupta-ox6tn - 19.07.2021 05:07

It is known that PCA causes loss of interpretability of the features. What is the alternative to PCA if we don't want to lose the interpretability? @Krish Naik. In case we have 40K features and we want to reduce dimension of the dataset without loosing the interpretability.

Ответить
@parthsarthijoshi6301
@parthsarthijoshi6301 - 14.07.2021 10:17

how to choose the number of components in PCA?

Ответить
@rahulgarg6363
@rahulgarg6363 - 10.07.2021 08:23

Hi krish , how Eigen values and Eigen vectors plays a role in capturing Principal components

Ответить
@someshkumar1578
@someshkumar1578 - 12.06.2021 13:43

Bhai mere agar features independent honge to pca lagaye hi kyun.

Ответить
@techsavy5669
@techsavy5669 - 11.06.2021 04:19

At time 10.28, when we do .. plt.scatter(x_pca[:,0] , .. shouldn't the second parameter here be target output column!! Why are we plotting it against x_pca[:,1] ?

Ответить
@yogitajain8003
@yogitajain8003 - 28.05.2021 15:16

ValueError: Found array with 0 sample(s) (shape=(0, 372)) while a minimum of 1 is required by StandardScaler.
But there is no missing value

Ответить
@dhy9361
@dhy9361 - 18.05.2021 04:48

thank you for solving my question!

Ответить
@himansu1182
@himansu1182 - 09.05.2021 21:16

I think small mistake on MinMaxscala here only used standard scaler

Ответить
@LAChinthaka
@LAChinthaka - 05.05.2021 20:05

Very clear and teach to the point. Thanks a lot.

Ответить
@joehansie6014
@joehansie6014 - 27.04.2021 17:43

Great work... 4 thumbs for you. Greetings from a master student.

Ответить
@bhaskersaiteja9531
@bhaskersaiteja9531 - 13.04.2021 21:56

How did you come to know that 'data' and 'feature_names' need to be considered for creating a dataframe from the file? Could you please explain

Ответить
@MartinHroch
@MartinHroch - 23.03.2021 14:32

Exactly half of the video was intro to data loading and explanation.... Where is the PCA???

Ответить
@dheerajkumar9857
@dheerajkumar9857 - 20.03.2021 12:40

Very neat explanation.

Ответить
@Manoj-Kumar-R
@Manoj-Kumar-R - 28.02.2021 13:05

Insightful video.. Can we have a PCA vs LDA comparison video?
Much appreciated work!

Ответить
@gahmusdlatfi4205
@gahmusdlatfi4205 - 22.02.2021 12:21

Hi Naik, do we apply the pca only on training dataset, or the whole dataset(training+test)? some litterature advise to apply pca on training only, but in this case how to predict test set with the transformed data? waiting for your reply, thank you in advance

Ответить
@ramyasrigorle2609
@ramyasrigorle2609 - 18.02.2021 17:05

Sir how to know what features(column names) are selected with pca ?

Ответить
@gahmusdlatfi4205
@gahmusdlatfi4205 - 09.02.2021 20:32

Thanks

Ответить
@AnitaDevkar
@AnitaDevkar - 09.02.2021 04:56

Apply Basic PCA on the iris dataset.
• Describe the data set. Should the dataset been standardized?
• Describe the structure of correlations among variables.
• Compute a PCA with the maximum number of components
.• Compute the cumulative explained variance ratio. Determine the number of
componentskby your computed values.
• Print thekprincipal components directions and correlations of thekprincipal compo-
nents with the original variables. Interpret the contribution of the original variables into
the PC.
• Plot the samples projected into thekfirst PCs.
• Color samples by their species

Ответить
@nayanparnami8554
@nayanparnami8554 - 03.02.2021 22:12

sir how to figure out no. of prinicipal componenets to which we want to reduce the original dimension ??

Ответить
@raghavendrapoloju6550
@raghavendrapoloju6550 - 20.01.2021 13:09

Firstly, Thanks for explaining PCA technique very clearly. Suppose, we do not know the features of a higher dimensional data. Is there any way to find the features and target within the data ? Is that possible by any chance. I am working with Hyperspectral raw data.

Ответить
@b_113_debashissaha9
@b_113_debashissaha9 - 17.01.2021 00:14

Excellent work for begineers

Ответить
@kamilc9286
@kamilc9286 - 09.01.2021 15:30

Shouldn't we validate how many PCA's are needed ?

Ответить