Help using Base SAS procedures

Imputing Measures of Central Tendency in Base 9.4

Posts: 20

Imputing Measures of Central Tendency in Base 9.4

[ Edited ]

Am wondering what procedures or methods are available for imputing measures of central tendency (mean, median, mode) in SAS Base 9.4? I took a look at SAS Procedures by Name, Google, and the boards.


From what I can tell, the PROC IMSTAT would be perfect, but that's not an available procedure in Base 9.4. I thought PROC STAT might exist (skipping the In Memory part) but alas, it does not. That brings me to PROC MI, which I think will work (based on here, here, and here) but honestly it might be too powerful for simply imputing (say) mean of a missing variable.


Is it possible to impute a mean value for missing variables in a data step? From what I can tell, the answer is no. Example:


data abc_test;
	set abc;
	if missing(VAR_NAME_HERE) then VAR_NAME_HERE_MU = mean(VAR_NAME_HERE);

Or, is it possible to extract the output of PROC MEANS and reference those values in a data step to impute a mean value for missing variables? Having trouble getting PROC MEANS to output specific stats for _ALL_ variables (as I would rather not type each variable name. Using "var _all_" seems to collapse across all, rather than spitting out stats at the variable level... I then tried a macro loop across each variable and an append (sound familiar to my other post?) but that failed horribly... And even if it did work, that just "extracts" the values I'm looking for, still not sure how to reference them during the data step to make imputing easier.


So! I don't mind spending the time reading if there's a good post or even SUGI paper that handles this, I just haven't been able to find one (other than those that delve into PROC MI). Any thoughts are greatly appreciated.




Edit / Update: Armed with SAS documentation on PROC MI (here), I rolled up my sleeves and dove in. It's actually not that bad and pretty awesome!


How might I go about preserving the original variables with missing datum at the observation level, and use the results from PROC MI to create new variables (e.g. same name but with _MI at the end)? Is the best approach to create a separate data set for PROC MI, rename the variables as appropriate, then join the two on a primary key?


Edit / Update: One more - I'm still having trouble getting the output of PROC MEANS as I'd like. This thread over at StackOverflow was helpful, and I get the reshape, but not quite working for me. The results I get seem to be reflective of the first row, and not across all observations for a variable. 


I'm interested in using the output from PROC MEANS to reference various imputations and trims (e.g. P99). Essentially I'd like to take store the results of the PROC MEANS below to an output data set (exactly how it's printed to RESULTS) and I simply can't get there...


proc means data = abc_test NOLABELS
	P1 P5 P10 P25 P50 P75 P90 P95 P99 MIN MAX QRANGE;
run; quit;

The code I'm running sets OUTPUT OUT =, as well as those stats above equal to each other (e.g. nmiss = nmiss), but it just never looks like the results that are spit out from the above.

Trusted Advisor
Posts: 1,615

Re: Imputing Measures of Central Tendency in Base 9.4

proc means data = abc_test NOLABELS;
    var x;
    output out=_stats_ NMISS=nmiss N=n MEAN=mean MEDIAN=median MODE=mode STD=std SKEW=skew P1=p1 P5=p5 P10=p10 P25=p25 P75=p75 P90=p90 P95=p95 P99=p99 MIN=min MAX=max QRANGE=qrange;
run; quit;

data abc_test1;
    if _n_=1 then set _stats_;
    set abc_test;
Posts: 20

Re: Imputing Measures of Central Tendency in Base 9.4

Unfortunately that yields the same 1-line result (appears to collapse across entire data set).


But, I was finally able to get what I was on the hunt for working... It's a bit long, and probably neither the cleanest nor most efficient code, but it works (feedback on improving it is welcomed).


*	SAS Macros;

*	Locals;
%let data_og = MB;
%let contents = &data_og._contents;
%let varname = name;

*	Macro for summary stats from PROC MEANS;
*	Use in conjunction with PROC TRANSPOSE;
%macro means(varname);
	proc means data = &data_og. noprint;
	output out = &varname. (drop = _freq_ _type_)
		nmiss(&varname.)	= &varname._nmiss
		n(&varname.)		= &varname._n
		mean(&varname.)	 	= &varname._mean
		median(&varname.)	= &varname._median
		mode(&varname.) 	= &varname._mode
		std(&varname.)	 	= &varname._std
		skew(&varname.)	 	= &varname._skew
		P1(&varname.)	 	= &varname._P1
		P5(&varname.)		= &varname._P5
		P10(&varname.)	 	= &varname._P10
		P25(&varname.)	 	= &varname._P25
		P50(&varname.)	 	= &varname._P50
		P75(&varname.)	 	= &varname._P75
		P90(&varname.)	 	= &varname._P90
		P95(&varname.)	 	= &varname._P95
		P99 (&varname.)		= &varname._P99
		min(&varname.)	 	= &varname._min
		max(&varname.)	 	= &varname._max
		qrange(&varname.)	= &varname._qrange
run; quit;

*	Macro to transpose summary stats from PROC MEANS;
%macro transpose(varname);
	proc transpose data = &varname. out = &varname._t;
		var _numeric_;
		by _character_;
	run; quit;

*	Macro to store summary stats from PROC MEANS as macro variables;
%macro symput(varname);
	data _null_;
		set &varname._t;
			call symput(_name_, col1);
	run; quit;


*	List out the column names and data types for the data set;
proc contents data = &data_og. out = &contents.;
run; quit;

*	Drop unnecessary variables gained from PROC CONTENTS;
data &contents.;
	set &contents.(keep = name type length varnum format formatl
		informat informl just npos nobs);
run; quit;

*	View contents of data set, more info than PROC CONTENTS output;
proc print data = &contents.;
run; quit;


*	For each variable in the data set, extract summary stats from
	proc means and store as varname, then transpose as varname_t;
data _null_;
	do i = 1 to num;
		set &contents. nobs = num;
			call execute('%means('||name||')');
			call execute('%transpose('||name||')');
			call execute('%symput('||name||')');
run; quit;

*	View all macro variables and verify data with PROC MEANS;
%put _user_;

proc means data = &data_og. NOLABELS
	P1 P5 P10 P25 P50 P75 P90 P95 P99 MIN MAX QRANGE;
run; quit;

Now I have all those values stored as macro variables, which will make it A LOT easier when truncating, trimming, or imputing data... What I really love about macros is how easy it is to set up generic code (like this) that can be applied regardless of the data set. The more I can automate and avoid manually typing values in, the happier I am...

Ask a Question
Discussion stats
  • 2 replies
  • 2 in conversation