I'd like to write a macro to replace any missing variables (numeric or character) with the most frequent value of that variable in a new column.
I don't have much so far, but here is what I've come up with. I basically don't know how to determine the most frequent value.
%MACRO impute_freq (dat, var_name);
proc freq data = &dat;
by descending &var_name;
run;
%if &var_name = . %then
newcol = ???
%else newcol = &var_name;
run;
%MEND impute_freq;
Yes, with PROC STDIZE and METHOD=IN(ds), no macros are needed.
But I can't think of a good reason to use the most frequent value; I would recommend using either the mean or the median.
UNTESTED CODE
proc summary data=have;
var variable1 variable2 ... ;
output out=mode(drop=_:) mode=;
run;
data modes;
set mode;
array x variable1 variable2 ... ;
_type_='LOCATION';
output;
do i=1 to dim(x);
x(i)=1;
end;
_type_='SCALE';
output;
drop i;
run;
proc stdize data=have out=want reponly method=in(modes);
var variable1 variable2 ... ;
run;
But really, don't use the MODE, use either MEDIAN or MEAN. A word to the wise ...
And in addition, if you want to replace missings with the median or mean, its even easier, as this is built directly into PROC STDIZE.
One way of doing it using a macro, despite being doable without
%macro Assign_Mode_to_Null(table,var);
proc sql noprint;
select &var into :mode
from (select &var, count(&var) as N
from &table
group by &var)
having N=max(N) and monotonic()=1;
quit;
data want (drop=dsid type rc);
dsid=open("&table",'i');
type=vartype(dsid,varnum(dsid,"&var"));
rc=close(dsid);
set &table;
if missing(&var) then do;
if type='N' then &var=&mode;
if type='C' then do; &var="&mode"; drop &mode; end;
end;
run;
%mend Assign_Mode_to_Null;
%Assign_Mode_to_Null(have,var_name);
data have;
set sashelp.class;
if _n_ in (1 4 7) then call missing(age, sex);
run;
%MACRO impute_freq (dat, var_name);
proc freq data=&dat(where=(&var_name is not missing)) order=freq noprint;
table &var_name/out=mode;
run;
data &dat;
set &dat;
if _n_=1 then set mode(keep=&var_name rename=(&var_name=_&var_name));
if missing(&var_name) then &var_name=_&var_name;
drop _&var_name;
run;
%MEND impute_freq;
%impute_freq(have,age)
%impute_freq(have,sex)
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: