BookmarkSubscribeRSS Feed
relish
Calcite | Level 5

I am using sas enterprise miner to conduct data cleaning process. While in the raw dataset, some observations involve too many missing values, so I want to delete these observations, and the rest of the missing values will be replaced by mean or most frequent value. My questions are:

1. Is there any criteria for deleting the observations(I mean the observations will be deleted when how many missing values it has?3,4,5?)

2. How to operate it in sas enterprise miner, or other softwares are required?

 

Thanks a lot!

2 REPLIES 2
ballardw
Super User

@relish wrote:

 

1. Is there any criteria for deleting the observations(I mean the observations will be deleted when how many missing values it has?3,4,5?)

 


That is your decision. Or are you asking HOW to do that.

In a data step you could do something like:

 

Data want;

   set have;

   if cmiss(var1, var2, var3, <list all the variables you want to test>) le <your specifiecd number>;

run;

 

CMISS counts the number of variables in the list have missing values. So if you want to restrict the maximum number of missing to 5 use: cmiss(<variables>) le 5.

You use might want to consider multiple groups of variables as well. Suppose you have 3 date variables that all must be present then use: if cmiss(date1, date2, date3) =0;

Then test separately for other variables (groups of variables) with a separate threshold for each.

 

AFTER you have the reduced set you would calculate the needed means and merge back to the data.

The basic SAS data step and procedures should suffice.

relish
Calcite | Level 5
Thanks so much for your reply, you mean the number is based on my own decision and there is no specific number required. I am just wondering whether there are specific percentage of missing values existing in one observation that should be considered to delete? or still based on our own decisions? Since everyone has different choice, will this affect the data performance?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1656 views
  • 0 likes
  • 2 in conversation