Handling NA in R | is.na, na.omit & na.rm Functions for Missing Values

Statistics Globe

5 лет назад

63,291 Просмотров

Скачать видео

Комментарии:

@taruvingatakudzwa151 - 02.02.2024 10:14

How do i merge two datasets A and B but data set B is a small data that has to go and replace certain cells in A

Ответить

@shambo9807 - 01.02.2024 04:12

Very clear and succinct. All the info I needed clearly explained. 👍🏾

Ответить

@lahirukudaligamage13 - 19.01.2024 05:19

YESSSSS THANK YOUUUUU

Ответить

@caynaan901 - 06.11.2023 23:39

Thank you for the good lesson; explained very clearly.

Ответить

@ezhankhan1035 - 06.11.2023 21:34

Directly answered what I was looking for - Thank you!
I have used 'drop_na()' as oppose to 'na.omit()' for the most part, but always good to know alternative ways of doing things.

Ответить

@francesco8150 - 01.11.2023 18:52

hi, i'm trying to do cov. with two groups of values, but one has NAs and R doesn't allow me to remove themwhan i do the cov, and if i rewrite the two groups without NA they are different in lenght, so cov can't be done, what i can do? ;(

Ответить

@organ1181 - 28.09.2023 18:11

How to deal with the missing data for catergory variable, please?

Ответить

@hoax9784 - 02.04.2023 15:53

and how do i do if it only shows other characters but not "NA", sir?

Ответить

@whitfieldlewis837 - 26.03.2023 16:56

good stuff

Ответить

@atthoriqpp - 03.03.2023 14:29

Hello i have a question!

Should you always remove missing values in dataset (especially for public data)? Or do we need to consider the proportion of missing data, missing value type (MCAR, MAR, NMAR), and skewness of the data?

I’m really struggled with this particular issue (not the technique, but the judgement as to remove missing values or not), Please shed me a light and thanks!

Ответить

@tirthanandi6122 - 25.01.2023 19:52

na.omit is removing the whole row. what if I do not remove the whole row? Is there any way I can plot geom_line without omitting na? The plot needs to ignore the point where there is a na?

Ответить

@mdiqbal7168 - 02.12.2022 16:20

Tabulated value and calculated value in t-test normal distribution by plot in R programming

Ответить

@mdiqbal7168 - 02.12.2022 16:18

R programming for t-test two tail tabulated value in plot

Ответить

@roshnyabraham7941 - 12.11.2022 01:26

Thank you so much! You have been such a good help.

Ответить

@atifdai313 - 09.10.2022 17:03

Excellent work

Ответить

@larissacury7714 - 28.08.2022 01:26

What if I had two entries for each SUBJECT and I want to filter both of their entries if one of their entries in another collumn is NA? ps: great video as always!

Ответить

@16kush - 25.08.2022 16:32

How to Undefined In place of NA?

Ответить

@claytontherrien7583 - 31.03.2022 22:33

Thank you very much!

Ответить

@lsjenny2198 - 12.03.2022 06:24

I am trying to use ggscatter but I have many NAs in y column and no correlation coefficient appears. Is there any way of ignoring these NAs or changing them to "0"? please help me, thank you.

Ответить

@jayw6886 - 17.02.2022 20:31

hello, great videos thanks! question, if I wanted to get the NA values in a separate subset instead of omitting or removing them, what can I do?

Ответить

@careenevans - 12.02.2022 16:20

I have been following your tutorials for a couple of days now. I want to say thank you, they are truly direct and straight to the point. I wish that you would offer consultation to students even if you decide to charge a price on it. Because sometimes one might get stuck and not know what to do.

Ответить

@Rhena - 21.11.2021 18:40

Könntest du das auch noch mal in Deutsch aufnehmen? :D

Ответить

@mariasaraiva9675 - 16.11.2021 03:38

The problem is that depending on the package na.rm does not work. It seems that each package has its own way to consider NAs. This is stressful when you are used to SAS.

Ответить

@fostkangben - 15.10.2021 02:28

Thanks for this video

Ответить

@meenakshigautam4249 - 19.09.2021 13:57

please help me in this .....my result saying argument y missing with no defualt

library(MASS)
library(maxLik)
library(matrixcalc)
mu1=2 ;mu2=2;sig1=1;sig2=1;sai=mu1-mu2;mu=c(mu1,mu2)
sigma=matrix(c(sig1,0,0,sig2),2,2,byrow=TRUE)
n=10;nr=mvrnorm(n,mu,sigma)
t1=sum(nr[1]);t2=sum(nr[2]);t3=sum((nr[1])^2);t4=sum((nr[2])^2)
t5=sum(nr[1]*nr[2]) ;c0=-t1-t2-n*log(2*3.14)
negl= function(x,y,z,w){
term1= (t3-2*(y+x)*t1+n*(x+y)^2)/(2*z)
term2=(t4-2*y*t2+n*y^2)/(2*w)
negl1= -c0 + (n/2)*log(z) + (n/2)*log(w) + term1 + term2
return(negl1)
}
v1=c(0,2,3,1)
maxBFGS(negl, grad=NULL, hess=NULL, start=v1, fixed=NULL,control=NULL,constraints=NULL,finalHessian=TRUE,parscale=rep(1, length=length(start)))

Ответить

@tmitra001 - 14.09.2021 06:38

I like all your Video

Ответить

@durduozkarc6345 - 10.09.2021 18:46

# LOAD LIBRARIES
library(quantmod)
library(xts)

# FUNCTIONS

# ROLLING BETA
pcbeta = function(dF){
r = prcomp( ~ dF$x[-1] + dF$y[-1])
return(r$rotation[2, 1] / r$rotation[1,1])
}

rolling_beta = function(z, width){
rollapply(z, width = width, FUN = pcbeta,
by.column = FALSE, align = 'right')
}

# GET TICKER DATA
SPY = getSymbols('SPY', adjust=T, auto.assign=FALSE)
AAPL = getSymbols('AAPL', adjust=T, auto.assign=FALSE)

# IN-SAMPLE DATE RANGE
in_start_date = '2011-01-01'
in_end_date = '2011-12-31'
in_range = paste(in_start_date, '::', in_end_date, sep='')

# RETRIEVE IN-SAMPLE DATA
x_in = SPY[in_range, 6]
y_in = AAPL[in_range, 6]

dF_in = cbind(x_in, y_in)
names(dF_in) = c('x','y')

# OUT-OF-SAMPLE DATE RANGE
out_start_date= '2012-01-01'
out_end_date = '2012-12-31'
out_range = paste(out_start_date, '::', out_end_date, sep='')

# RETRIEVE OUT-OF-SAMPLE DATA
x_out = SPY[out_range, 6]
y_out = AAPL[out_range, 6]

dF_out = cbind(x_out, y_out)
names(dF_out) = c('x', 'y')

# CALCULATE RETURNS (IN AND OUT OF SAMPLE)
returns_in = diff(dF_in) / dF_in
returns_out = diff(dF_out) / dF_out

# DEFINE ROLLING WINDOW LENGTH
window_length = 10

# FIND BETAS
betas_in = rolling_beta(returns_in, window_length)
betas_out = rolling_beta(returns_out, window_length)

# FIND SPREADS
spreadR_in = returns_in$y - betas_in * returns_in$x
spreadR_out = returns_out$y - betas_out * returns_out$x

names(spreadR_in) = c('spread')
names(spreadR_out) = c('spread')

# FIND THRESHOLD
threshold = sd(spreadR_in, na.rm=TRUE)
plot(data$spread, main = "AAPL vs. SPY In-Sample", cex.main = 0.8, cex.lab = 0.8, cex.axis = 0.8)
abline(h = threshold, lty = 2)
abline(h = -threshold, lty = 2)

abline function not work why?

Ответить

@shanti3310 - 27.07.2021 00:41

Hello,

How do handle NaN in R?

Ответить

@azad2546421 - 01.06.2021 11:32

Sir, in your statisticsglobe website, where do we start? As a beginner to R, I'd like to know as to where to start. Thanks

Ответить

@aloysduistermaat7046 - 28.05.2021 12:36

How does this work the other way round? For example, I want all values in my dataframe to become NA if they are below 0.4. Thank you!

Ответить

@eapen4irm - 18.05.2021 21:23

Your videos are amazing and easy to understand! Thank you!!!

Ответить

@arunbioinfo1100 - 17.05.2021 15:41

excellent joachim, perfectly explained

Ответить

@anthonyfernandezgonzalez8262 - 11.05.2021 20:44

Love it, thank you one more time dude! Love the way you prepared your lessons ´cause they are really short, focus on an specific context and finally you gave us multiple solutions for an scenario, so thats the way it must be.

Ответить

@DavidKaranjamdavis - 16.03.2021 12:03

Informative and well explained

Ответить

@Michelle-mv1gg - 04.03.2021 00:20

how do you handle or replace NA values in a dataset where dates and other numeric information is missing .

Ответить

@Paula-uj8ps - 10.02.2021 19:50

@Statistics Globe Vielen Dank für das tolle Video. Das hat wirklich geholfen :) Leider habe ich immer noch ein Problem, und ich hoffe wirklch sehr, dass du meine Frage beantworten kannst. An welche Stelle setzte ich das na.rm = TRUE in einem komplexeren Code?
Ich bekomme immer eine Fehlermeldung und ich schätze (laut Internetrecherche) dass diese etwas mit den NA zu tun hat: Fehler in KhatriRao(sm, t(mm)) : (p <- ncol(X)) == ncol(Y) is not TRUE.
Doch wenn ich na.rm = TRUE verwende, behauptet R dies sei ein unbenutztes Argument. Ich vermute ich habe es an die falsche Stelle geschrieben.

Das ist mein Code:
fit2 <- lme4::lmer(stroop$rt ~ 1 + stroop$trialnum + (1 + stroop$trialnum|stroop$pno), data = stroop)

LG, Paula

(Der Datensatz hat über 26000 Zeilen und über 2600 NAs)

Ответить

@Jonpaulim - 24.11.2020 21:44

Hi can I ask a question please

Ответить

@Jay19876 - 20.11.2020 21:17

Can you just remove NA's from a specific column within a data set? For example, if I have a column such as "wind chill" which has a lot of blanks when its not cold outside, I don't want to erase all of that data from the data set if I am looking at another column/vector of interest. Thanks!

Ответить

@lh4818 - 10.11.2020 03:43

How can You make a new data frame that excludes all the NA values

Ответить

@jenevavergara4125 - 05.10.2020 14:28

how about if I only want to remove rows with all values are NA?

Ответить

@sun27g - 23.09.2020 19:10

when you ran na.omit(airquality) before mean(airquality$ozone) already rows with NAs were deleted, giving you a complete numeric dataset, then why mean(airquality$ozone) is returning NA again....

Ответить

@eyadha1 - 16.09.2020 11:39

Thanks. Very helpful

Ответить

@mugangakivumbi - 16.09.2020 10:44

Thanks you,tutorial was very helpful

Ответить

@frankjr3787 - 09.09.2020 08:06

THank you very much for this video (Just subscribed). How do you remove 'NA" from a data set that has no numeric values. Say I just had to Columns( Name and Hair Color) and some of the Hair colors were NA.. how would I omit that?

Ответить

@lavinaarora3697 - 23.08.2020 17:10

After omitting the NA the nos of rows still show the numbers in the original data set . Though I see that the number of row in the data after committing the rows is 111. which code can I use to get this 111 as nrow() gives me the original numbers

Ответить

@borknagarpopinga4089 - 19.08.2020 20:37

How can I delete a certain row only if the amount of NA's surpasses a certain threshold? E.g. when I have like 100 slope coefficients, but only one value is missing, it sounds a bit harsh to delete the whole row. How can I tell R to only delete the row, if there's let's say more than 10 NA's?

Ответить