I am handling a cross-sectional data set.
I identified that a number of my variable of interests were missing values, including continued, bi-variate and ordinal data.
I am wondering if I can replace the missing values by its mean or median.
And I want to know if there is a cap for allowing the replacing its value, e.g., missing values must less than 10% or so.
Anyone can help much appreciated.
Phan S.
Here's an inefficient macro that will cap the outliers.
https://gist.github.com/statgeek/31316a678433a1db8136
PROC STDIZE is a better option for replacing with median/mean. You can also look into PROC MI, multiple imputation to impute missing data.
Hi,
Missing values can be replaced with various statistics using proc stdize. Below is an example replacing missing values with median.
Defining a cap would be based on your analysis. You can flag variables containing a certain percentage of missing values for imputation.
proc stdize data=have reponly method=median out=imputed;
var a b c; /* Assuming a, b and c are 3 numeric variables */
run;
Hello,
I am sorry, I mean I give the solution (credit) to you, bu accidentally check to Reeza.
Thank you for you code.
Phan S.
Here's an inefficient macro that will cap the outliers.
https://gist.github.com/statgeek/31316a678433a1db8136
PROC STDIZE is a better option for replacing with median/mean. You can also look into PROC MI, multiple imputation to impute missing data.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.