topic Re: Count Methodology for imputation? in New SAS User

Count Methodology for imputation?

edasdfasdfasdfa — Tue, 30 Apr 2019 22:14:26 GMT

I read the following (below) in some article on here:

For categorical variables, the most common methodology is “count” wherein you fill the missing values with the most common level of the categorical variable.

How is this performed? I can't find any information on it.

Re: Count Methodology for imputation?

ballardw — Tue, 30 Apr 2019 22:26:28 GMT

One very crude method: Proc Freq plus a data step. Find the most frequent occurrence using proc freq then something like:

Data want;

set have;

if missing(var) then var='mostcommonvalue';

run;

Similar for replacing with a Mean value, proc means/summary to get the mean and replace missing values.

Re: Count Methodology for imputation?

edasdfasdfasdfa — Tue, 30 Apr 2019 22:29:37 GMT

For numeric variables, you can use proc stdize but I have never seen documentation on character variables.

proc stdize data=train

method=median out=traini

var var1

run;

Re: Count Methodology for imputation?

Reeza — Tue, 30 Apr 2019 22:39:22 GMT

You need to first understand how and why the values are missing before you can say what an appropriate method is. Using the largest group isn't a great method. An alternative is to actually model the data to predict the category - using logistic regression or discriminant analysis. These are both covered in PROC MI and both have examples in the documentation, 79.4 & 79.5 Examples

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_mi_examples04.htm&docsetVersion=15.1&locale=en

@edasdfasdfasdfa wrote:

I read the following (below) in some article on here:

For categorical variables, the most common methodology is “count” wherein you fill the missing values with the most common level of the categorical variable.

How is this performed? I can't find any information on it.