BookmarkSubscribeRSS Feed
kchoi78
Calcite | Level 5

When I aggregate a dataset via proc sql, I have to state the column name via "as" statement else the column will return as something like "TEMP005." There are so many cases where I just want the new aggregated column to have the same name as the var that I am aggregating. For example, say I have a table with four variables: name, amount, hours, and rate, and I am aggregating columns amount, hours, and rate.

proc sql;

create table test as 

select distinct name, sum(amount) as amount, sum(hours) as hours, sum(rate) as rate

from have

group by name

;

quit;

Is there any way to not have to write the "as" statement every single time and have the column names just default to whatever I am aggregating? 

 

3 REPLIES 3
SASKiwi
PROC Star

I don't think there is any PROC SQL option for this but using macro could help:

%macro SQL_Sum (group = , var1 = , var2 = ,var3 = );

proc sql;
  create table test as 
  select &group, sum(&var1) as &var1, sum(&var2) as &var2, sum(&var3) as &var3
  from have
  group by &group
;
quit;

%mend SQL_Sum;

%SQL_Sum (group = name, var1 = amount, var2 = hours ,var3 = rate);

BTW using DISTINCT is redundant when it is in a GROUP BY.

ballardw
Super User

Sometimes you may want to consider a different procedure.

 

 

Proc summary data=have nway;
   class name;
  var amount hours rate;
  output out= test (drop=_:) sum = ;
run;

Proc Means/ Summary also has an Autoname feature that will append the requested statistic(s) names to the variable such as:

Proc summary data=have nway;
   class name;
  var amount hours rate;
  output out= test (drop=_:) sum = max= min= std= / autoname;
run;

Which will create amount_sum, amount_max, amount_min, amount_std (hope you get the picture).

The NWAY on the Proc statement suppresses multiple levels of combinations of the class variables. Otherwise you get one row that has the statistic overall (and assorted combinations if more than one class variable is used). The drop= drops two variables that are automatically supplied: _freq_ how many observations used and _type_ which indicates which specific combination of class variables are represented for the observation in the output data set.

 

You can also any of the forms of variable lists in Proc Means/Summary so you could get the summaries for all of the numeric variables by using:  var _numeric_; instead of listing all the variable names.

 

If you just want to see the results and don't need a data set Proc Report or Tabulate will make summaries as well.

 

 

 

Kurt_Bremser
Super User

That DISTINCT is, as noted, redundant, and can be a real performance killer when working with large datasets.

BIG hint for the future: only use code that is needed (which implies that you have to know the function of each keyword used when working with code. Knowledge is Power.)

If the PROC SUMMARY with CLASS cracks your memory limitations (may happen in multi-user environments, where memory must be limited to protect users from each other's code, and with a high cardinality of name), sort first and use BY:

proc sort data=have;
by name;
run;

proc summary data=have;
by name;
var amount hours rate;
output out=test (drop=_:) sum()=;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 804 views
  • 5 likes
  • 4 in conversation