turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Missing value

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-03-2018 02:38 PM

I am handling a cross-sectional data set.

I identified that a number of my variable of interests were missing values, including continued, bi-variate and ordinal data.

I am wondering if I can replace the missing values by its mean or median.

And I want to know if there is a cap for allowing the replacing its value, e.g., missing values must less than 10% or so.

Anyone can help much appreciated.

Phan S.

Accepted Solutions

Solution

02-03-2018
03:50 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PhanS

02-03-2018 03:32 PM

Here's an inefficient macro that will cap the outliers.

https://gist.github.com/statgeek/31316a678433a1db8136

PROC STDIZE is a better option for replacing with median/mean. You can also look into PROC MI, multiple imputation to impute missing data.

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 85 million projects.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PhanS

02-03-2018 02:51 PM

Hi,

Missing values can be replaced with various statistics using proc stdize. Below is an example replacing missing values with median.

Defining a cap would be based on your analysis. You can flag variables containing a certain percentage of missing values for imputation.

proc stdize data=have reponly method=median out=imputed;

var a b c; /* Assuming a, b and c are 3 numeric variables */

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to stat_sas

02-03-2018 03:59 PM

Hello,

I am sorry, I mean I give the solution (credit) to you, bu accidentally check to Reeza.

Thank you for you code.

Phan S.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PhanS

02-03-2018 04:31 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PhanS

02-03-2018 10:25 PM

Solution

02-03-2018
03:50 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PhanS

02-03-2018 03:32 PM

Here's an inefficient macro that will cap the outliers.

https://gist.github.com/statgeek/31316a678433a1db8136

PROC STDIZE is a better option for replacing with median/mean. You can also look into PROC MI, multiple imputation to impute missing data.