Am wondering what procedures or methods are available for imputing measures of central tendency (mean, median, mode) in SAS Base 9.4? I took a look at SAS Procedures by Name, Google, and the boards.
From what I can tell, the PROC IMSTAT would be perfect, but that's not an available procedure in Base 9.4. I thought PROC STAT might exist (skipping the In Memory part) but alas, it does not. That brings me to PROC MI, which I think will work (based on here, here, and here) but honestly it might be too powerful for simply imputing (say) mean of a missing variable.
Is it possible to impute a mean value for missing variables in a data step? From what I can tell, the answer is no. Example:
data abc_test; set abc; if missing(VAR_NAME_HERE) then VAR_NAME_HERE_MU = mean(VAR_NAME_HERE); run;
Or, is it possible to extract the output of PROC MEANS and reference those values in a data step to impute a mean value for missing variables? Having trouble getting PROC MEANS to output specific stats for _ALL_ variables (as I would rather not type each variable name. Using "var _all_" seems to collapse across all, rather than spitting out stats at the variable level... I then tried a macro loop across each variable and an append (sound familiar to my other post?) but that failed horribly... And even if it did work, that just "extracts" the values I'm looking for, still not sure how to reference them during the data step to make imputing easier.
So! I don't mind spending the time reading if there's a good post or even SUGI paper that handles this, I just haven't been able to find one (other than those that delve into PROC MI). Any thoughts are greatly appreciated.
Michael
Edit / Update: Armed with SAS documentation on PROC MI (here), I rolled up my sleeves and dove in. It's actually not that bad and pretty awesome!
How might I go about preserving the original variables with missing datum at the observation level, and use the results from PROC MI to create new variables (e.g. same name but with _MI at the end)? Is the best approach to create a separate data set for PROC MI, rename the variables as appropriate, then join the two on a primary key?
Edit / Update: One more - I'm still having trouble getting the output of PROC MEANS as I'd like. This thread over at StackOverflow was helpful, and I get the reshape, but not quite working for me. The results I get seem to be reflective of the first row, and not across all observations for a variable.
I'm interested in using the output from PROC MEANS to reference various imputations and trims (e.g. P99). Essentially I'd like to take store the results of the PROC MEANS below to an output data set (exactly how it's printed to RESULTS) and I simply can't get there...
proc means data = abc_test NOLABELS NMISS N MEAN MEDIAN MODE STD SKEW P1 P5 P10 P25 P50 P75 P90 P95 P99 MIN MAX QRANGE; run; quit;
The code I'm running sets OUTPUT OUT =, as well as those stats above equal to each other (e.g. nmiss = nmiss), but it just never looks like the results that are spit out from the above.
proc means data = abc_test NOLABELS;
var x;
output out=_stats_ NMISS=nmiss N=n MEAN=mean MEDIAN=median MODE=mode STD=std SKEW=skew P1=p1 P5=p5 P10=p10 P25=p25 P75=p75 P90=p90 P95=p95 P99=p99 MIN=min MAX=max QRANGE=qrange;
run; quit;
data abc_test1;
if _n_=1 then set _stats_;
set abc_test;
run;
Unfortunately that yields the same 1-line result (appears to collapse across entire data set).
But, I was finally able to get what I was on the hunt for working... It's a bit long, and probably neither the cleanest nor most efficient code, but it works (feedback on improving it is welcomed).
**********************************************************************; * SAS Macros; **********************************************************************; * Locals; %let data_og = MB; %let contents = &data_og._contents; %let varname = name; * Macro for summary stats from PROC MEANS; * Use in conjunction with PROC TRANSPOSE; %macro means(varname); proc means data = &data_og. noprint; output out = &varname. (drop = _freq_ _type_) nmiss(&varname.) = &varname._nmiss n(&varname.) = &varname._n mean(&varname.) = &varname._mean median(&varname.) = &varname._median mode(&varname.) = &varname._mode std(&varname.) = &varname._std skew(&varname.) = &varname._skew P1(&varname.) = &varname._P1 P5(&varname.) = &varname._P5 P10(&varname.) = &varname._P10 P25(&varname.) = &varname._P25 P50(&varname.) = &varname._P50 P75(&varname.) = &varname._P75 P90(&varname.) = &varname._P90 P95(&varname.) = &varname._P95 P99 (&varname.) = &varname._P99 min(&varname.) = &varname._min max(&varname.) = &varname._max qrange(&varname.) = &varname._qrange ; run; quit; %mend; * Macro to transpose summary stats from PROC MEANS; %macro transpose(varname); proc transpose data = &varname. out = &varname._t; var _numeric_; by _character_; run; quit; %mend; * Macro to store summary stats from PROC MEANS as macro variables; %macro symput(varname); data _null_; set &varname._t; call symput(_name_, col1); run; quit; %mend; **********************************************************************; * PROC CONTENTS; **********************************************************************; * List out the column names and data types for the data set; proc contents data = &data_og. out = &contents.; run; quit; * Drop unnecessary variables gained from PROC CONTENTS; data &contents.; set &contents.(keep = name type length varnum format formatl informat informl just npos nobs); run; quit; * View contents of data set, more info than PROC CONTENTS output; proc print data = &contents.; run; quit; **********************************************************************; * PROC MEANS; **********************************************************************; * For each variable in the data set, extract summary stats from proc means and store as varname, then transpose as varname_t; data _null_; do i = 1 to num; set &contents. nobs = num; call execute('%means('||name||')'); call execute('%transpose('||name||')'); call execute('%symput('||name||')'); end; run; quit; * View all macro variables and verify data with PROC MEANS; %put _user_; proc means data = &data_og. NOLABELS NMISS N MEAN MEDIAN MODE STD SKEW P1 P5 P10 P25 P50 P75 P90 P95 P99 MIN MAX QRANGE; run; quit;
Now I have all those values stored as macro variables, which will make it A LOT easier when truncating, trimming, or imputing data... What I really love about macros is how easy it is to set up generic code (like this) that can be applied regardless of the data set. The more I can automate and avoid manually typing values in, the happier I am...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.