- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to write a macro to replace any missing variables (numeric or character) with the most frequent value of that variable in a new column.
I don't have much so far, but here is what I've come up with. I basically don't know how to determine the most frequent value.
%MACRO impute_freq (dat, var_name);
proc freq data = &dat;
by descending &var_name;
run;
%if &var_name = . %then
newcol = ???
%else newcol = &var_name;
run;
%MEND impute_freq;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
https://stats.idre.ucla.edu/sas/seminars/multiple-imputation-in-sas/mi_new_1/
You can also look into PROC STDIZE which has several methods to replace missing values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, with PROC STDIZE and METHOD=IN(ds), no macros are needed.
But I can't think of a good reason to use the most frequent value; I would recommend using either the mean or the median.
UNTESTED CODE
proc summary data=have;
var variable1 variable2 ... ;
output out=mode(drop=_:) mode=;
run;
data modes;
set mode;
array x variable1 variable2 ... ;
_type_='LOCATION';
output;
do i=1 to dim(x);
x(i)=1;
end;
_type_='SCALE';
output;
drop i;
run;
proc stdize data=have out=want reponly method=in(modes);
var variable1 variable2 ... ;
run;
But really, don't use the MODE, use either MEDIAN or MEAN. A word to the wise ...
And in addition, if you want to replace missings with the median or mean, its even easier, as this is built directly into PROC STDIZE.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One way of doing it using a macro, despite being doable without
%macro Assign_Mode_to_Null(table,var);
proc sql noprint;
select &var into :mode
from (select &var, count(&var) as N
from &table
group by &var)
having N=max(N) and monotonic()=1;
quit;
data want (drop=dsid type rc);
dsid=open("&table",'i');
type=vartype(dsid,varnum(dsid,"&var"));
rc=close(dsid);
set &table;
if missing(&var) then do;
if type='N' then &var=&mode;
if type='C' then do; &var="&mode"; drop &mode; end;
end;
run;
%mend Assign_Mode_to_Null;
%Assign_Mode_to_Null(have,var_name);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
set sashelp.class;
if _n_ in (1 4 7) then call missing(age, sex);
run;
%MACRO impute_freq (dat, var_name);
proc freq data=&dat(where=(&var_name is not missing)) order=freq noprint;
table &var_name/out=mode;
run;
data &dat;
set &dat;
if _n_=1 then set mode(keep=&var_name rename=(&var_name=_&var_name));
if missing(&var_name) then &var_name=_&var_name;
drop _&var_name;
run;
%MEND impute_freq;
%impute_freq(have,age)
%impute_freq(have,sex)