I am using sas enterprise miner to conduct data cleaning process. While in the raw dataset, some observations involve too many missing values, so I want to delete these observations, and the rest of the missing values will be replaced by mean or most frequent value. My questions are:
1. Is there any criteria for deleting the observations(I mean the observations will be deleted when how many missing values it has?3,4,5?)
2. How to operate it in sas enterprise miner, or other softwares are required?
Thanks a lot!
@relish wrote:
1. Is there any criteria for deleting the observations(I mean the observations will be deleted when how many missing values it has?3,4,5?)
That is your decision. Or are you asking HOW to do that.
In a data step you could do something like:
Data want;
set have;
if cmiss(var1, var2, var3, <list all the variables you want to test>) le <your specifiecd number>;
run;
CMISS counts the number of variables in the list have missing values. So if you want to restrict the maximum number of missing to 5 use: cmiss(<variables>) le 5.
You use might want to consider multiple groups of variables as well. Suppose you have 3 date variables that all must be present then use: if cmiss(date1, date2, date3) =0;
Then test separately for other variables (groups of variables) with a separate threshold for each.
AFTER you have the reduced set you would calculate the needed means and merge back to the data.
The basic SAS data step and procedures should suffice.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.