DATA Step, Macro, Functions and more

Missing value

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 84
Accepted Solution

Missing value

I am handling a cross-sectional data set.

I identified that a number of my variable of interests were missing values, including continued, bi-variate and ordinal data. 

I am wondering if I can replace the missing values by its mean or median.     

And I want to know if there is a cap for allowing the replacing its value, e.g., missing values must less than 10% or so.  

 

Anyone can help much appreciated.

 

Phan S.

 


Accepted Solutions
Solution
‎02-03-2018 03:50 PM
Super User
Posts: 23,778

Re: Missing value

Here's an inefficient macro that will cap the outliers. 

https://gist.github.com/statgeek/31316a678433a1db8136

 

PROC STDIZE is a better option for replacing with median/mean. You can also look into PROC MI, multiple imputation to impute missing data. 

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,270

Re: Missing value

Hi,

 

Missing values can be replaced with various statistics using proc stdize. Below is an example replacing missing values with median.

Defining a cap would be based on your analysis. You can flag variables containing a certain percentage of missing values for imputation.

 

 

 

proc stdize data=have reponly method=median out=imputed;
var a b c; /* Assuming a, b and c are 3 numeric variables */
run;

Frequent Contributor
Posts: 84

Re: Missing value

Hello,

 

I am sorry, I mean I give the solution (credit) to you, bu accidentally check to Reeza. 

 

Thank you for you code.

 

Phan S. 

 

Super User
Posts: 23,778

Re: Missing value

@PhanS You can change that to @stat_sas, just select theirs instead.

Trusted Advisor
Posts: 1,270

Re: Missing value

Hi,

 

I am glad you have the solution. I am also  learning from @Reeza's posts Smiley Happy

 

Solution
‎02-03-2018 03:50 PM
Super User
Posts: 23,778

Re: Missing value

Here's an inefficient macro that will cap the outliers. 

https://gist.github.com/statgeek/31316a678433a1db8136

 

PROC STDIZE is a better option for replacing with median/mean. You can also look into PROC MI, multiple imputation to impute missing data. 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 269 views
  • 0 likes
  • 3 in conversation