BookmarkSubscribeRSS Feed
edasdfasdfasdfa
Quartz | Level 8

I read the following (below) in some article on here:

 

For categorical variables, the most common methodology is “count” wherein you fill the missing values with the most common level of the categorical variable.

 

How is this performed? I can't find any information on it.

3 REPLIES 3
ballardw
Super User

One very crude method: Proc Freq plus a data step. Find the most frequent occurrence using proc freq then something like:

 

Data want;

   set have;

   if missing(var) then var='mostcommonvalue';

run;

 

Similar for replacing with a Mean value, proc means/summary to get the mean and replace missing values.

edasdfasdfasdfa
Quartz | Level 8

For numeric variables, you can use proc stdize but I have never seen documentation on character variables.

 

Ie

 

proc stdize data=train

method=median out=traini

var var1

run;

Reeza
Super User

You need to first understand how and why the values are missing before you can say what an appropriate method is. Using the largest group isn't a great method. An alternative is to actually model the data to predict the category - using logistic regression or discriminant analysis. These are both covered in PROC MI and both have examples in the documentation, 79.4 & 79.5 Examples

 

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_mi_examples04.htm&docsetVersion=1...

 


@edasdfasdfasdfa wrote:

I read the following (below) in some article on here:

 

For categorical variables, the most common methodology is “count” wherein you fill the missing values with the most common level of the categorical variable.

 

How is this performed? I can't find any information on it.