I'd like to write a macro to replace any missing variables (numeric or character) with the most frequent value of that variable in a new column.
I don't have much so far, but here is what I've come up with. I basically don't know how to determine the most frequent value.
%MACRO impute_freq (dat, var_name);
proc freq data = &dat;
by descending &var_name;
run;
%if &var_name = . %then
newcol = ???
%else newcol = &var_name;
run;
%MEND impute_freq;
Yes, with PROC STDIZE and METHOD=IN(ds), no macros are needed.
But I can't think of a good reason to use the most frequent value; I would recommend using either the mean or the median.
UNTESTED CODE
proc summary data=have;
var variable1 variable2 ... ;
output out=mode(drop=_:) mode=;
run;
data modes;
set mode;
array x variable1 variable2 ... ;
_type_='LOCATION';
output;
do i=1 to dim(x);
x(i)=1;
end;
_type_='SCALE';
output;
drop i;
run;
proc stdize data=have out=want reponly method=in(modes);
var variable1 variable2 ... ;
run;
But really, don't use the MODE, use either MEDIAN or MEAN. A word to the wise ...
And in addition, if you want to replace missings with the median or mean, its even easier, as this is built directly into PROC STDIZE.
One way of doing it using a macro, despite being doable without
%macro Assign_Mode_to_Null(table,var);
proc sql noprint;
select &var into :mode
from (select &var, count(&var) as N
from &table
group by &var)
having N=max(N) and monotonic()=1;
quit;
data want (drop=dsid type rc);
dsid=open("&table",'i');
type=vartype(dsid,varnum(dsid,"&var"));
rc=close(dsid);
set &table;
if missing(&var) then do;
if type='N' then &var=&mode;
if type='C' then do; &var="&mode"; drop &mode; end;
end;
run;
%mend Assign_Mode_to_Null;
%Assign_Mode_to_Null(have,var_name);
data have;
set sashelp.class;
if _n_ in (1 4 7) then call missing(age, sex);
run;
%MACRO impute_freq (dat, var_name);
proc freq data=&dat(where=(&var_name is not missing)) order=freq noprint;
table &var_name/out=mode;
run;
data &dat;
set &dat;
if _n_=1 then set mode(keep=&var_name rename=(&var_name=_&var_name));
if missing(&var_name) then &var_name=_&var_name;
drop _&var_name;
run;
%MEND impute_freq;
%impute_freq(have,age)
%impute_freq(have,sex)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.