BookmarkSubscribeRSS Feed
Anastasija98
Calcite | Level 5

Hello,

I try to identify Extreme Values of Analysis Variables but for many columns. Could you help me and explain how I can do it for many columns?

It's how I do it for one columns;

 

proc means data=w2;

var act_age ;
output out=stat 
idgroup (max(act_age) out[2] (act_age)=maxrev) idgroup (min(act_age) 
out[2] (act_age)=minrev) q1=q3=mean= /autoname;
run; 

 

3 REPLIES 3
sbxkoenk
SAS Super FREQ

Example:

data have;
 LENGTH Name $ 5;
 set sashelp.shoes;
 Name=put(_N_,z5.);
run;

proc means data=have nway;
 CLASS Region Product;
 var Sales Inventory Returns ;
 output out=stat
 idgroup (max(Sales Inventory Returns) /*obs*/ out[3] (Sales Inventory Returns Name Region Product)=) 
 q1= q3= mean= / autolabel autoname /*levels*/;
run;

proc transpose data=stat out=stat_max_trp(rename=(col1=col1_MAX)); BY Region Product; run;

proc means data=have nway;
 CLASS Region Product;
 var Sales Inventory Returns ;
 output out=stat
 idgroup (min(Sales Inventory Returns) /*obs*/ out[3] (Sales Inventory Returns Name Region Product)=)
 q1= q3= mean= / autolabel autoname /*levels*/;
run;

proc transpose data=stat out=stat_min_trp(rename=(col1=col1_MIN)); BY Region Product; run;

data want;
 merge stat_max_trp stat_min_trp;
 *BY Region Product _NAME_ _LABEL_;
run;
/* end of program */

Koen

ballardw
Super User

You don't appear to be asking for any ID for the max or min values. Is that in fact the case and you just want the n max /min values?

Just repeat the variable names in the positions you have your variable:

proc means data=sashelp.class noprint;
   var height weight;

   output out=summary
   idgroup (max(height weight) out[2] (height weight)=maxh maxw) 
   idgroup (min(height weight) out[2] (height weight)=minh minw)
   ;
run;

If there are truly "many" variables getting the names of the output variables may be entertaining

 

How many max or min values do you want? Proc Univariate by default will give 5 max and min values for every variable on the VAR statement.

FreelanceReinh
Jade | Level 19

Hello @Anastasija98 and welcome to the SAS Support Communities!

 

My understanding is that you want to analyze several numeric variables separately, so that specifying their names in a single variable list in the MAX(...) and MIN(...) parts of the IDGROUP specification is not appropriate.

 

You could type two more IDGROUP specifications for each additional analysis variable. If there are too many analysis variables for this manual approach, you can use code generation, e.g., create the code of the OUTPUT statement in a DATA step and then %INCLUDE it in the PROC MEANS step.

 

Here is an example creating the desired statistics for all numeric variables in dataset SASHELP.CARS, arbitrarily excluding EngineSize and Cylinders:

filename outpstmt temp;

data _null_;
file outpstmt;
set sashelp.vcolumn end=last;
where libname='SASHELP' & memname='CARS' & type='num' & name ~in: ('E','C');
if _n_=1 then put 'output out=stat';
put 'idgroup (max(' name +(-1) ') out[2] (' name +(-1) ')=' name +(-1) '_Max)';
put 'idgroup (min(' name +(-1) ') out[2] (' name +(-1) ')=' name +(-1) '_Min)';
if last then put 'q1= q3= mean= /autoname;';
run;

proc means data=sashelp.cars(drop=e: c:);
var _numeric_;
%include outpstmt / source2;
run;

 

The above DATA step writes the complete OUTPUT statement including sixteen similar IDGROUP specifications (two for each of the eight variables MSRP, Invoice, ..., Length) to a temporary text file, which the %INCLUDE statement then refers to in the PROC MEANS step (see the part of the log created by the SOURCE2 option). The eight variable names are retrieved from the view SASHELP.VCOLUMN. The WHERE condition in your application may start with

libname='WORK' & memname='W2'

(note the upper-case strings) and possibly omit the criterion (using variable name) restricting the set of analysis variables or use a different criterion to select the appropriate set of variable names. Similarly, the DROP= dataset option in the PROC MEANS step would be omitted, modified or replaced by a KEEP= dataset option (or a more specific VAR statement).

 

In the two PUT statements I specified that the names of the variables containing the extreme values be constructed as, e.g., MSRP_Max_1, MSRP_Max_2MSRP_Min_1, MSRP_Min_2 (for analysis variable MSRP), to be consistent with the variable names for the quartiles and the means created by the AUTONAME option.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 854 views
  • 5 likes
  • 4 in conversation