PROC SQL: default column names?

kchoi78 · Posted 08-25-2021 08:01 PM

When I aggregate a dataset via proc sql, I have to state the column name via "as" statement else the column will return as something like "TEMP005." There are so many cases where I just want the new aggregated column to have the same name as the var that I am aggregating. For example, say I have a table with four variables: name, amount, hours, and rate, and I am aggregating columns amount, hours, and rate.

proc sql;

create table test as

select distinct name, sum(amount) as amount, sum(hours) as hours, sum(rate) as rate

from have

group by name

;

quit;

Is there any way to not have to write the "as" statement every single time and have the column names just default to whatever I am aggregating?

SASKiwi · Posted 08-25-2021 08:20 PM

I don't think there is any PROC SQL option for this but using macro could help:

%macro SQL_Sum (group = , var1 = , var2 = ,var3 = );

proc sql;
  create table test as 
  select &group, sum(&var1) as &var1, sum(&var2) as &var2, sum(&var3) as &var3
  from have
  group by &group
;
quit;

%mend SQL_Sum;

%SQL_Sum (group = name, var1 = amount, var2 = hours ,var3 = rate);

BTW using DISTINCT is redundant when it is in a GROUP BY.

ballardw · Posted 08-25-2021 08:25 PM

Sometimes you may want to consider a different procedure.

Proc summary data=have nway;
   class name;
  var amount hours rate;
  output out= test (drop=_:) sum = ;
run;

Proc Means/ Summary also has an Autoname feature that will append the requested statistic(s) names to the variable such as:

Proc summary data=have nway;
   class name;
  var amount hours rate;
  output out= test (drop=_:) sum = max= min= std= / autoname;
run;

Which will create amount_sum, amount_max, amount_min, amount_std (hope you get the picture).

The NWAY on the Proc statement suppresses multiple levels of combinations of the class variables. Otherwise you get one row that has the statistic overall (and assorted combinations if more than one class variable is used). The drop= drops two variables that are automatically supplied: _freq_ how many observations used and _type_ which indicates which specific combination of class variables are represented for the observation in the output data set.

You can also any of the forms of variable lists in Proc Means/Summary so you could get the summaries for all of the numeric variables by using: var _numeric_; instead of listing all the variable names.

If you just want to see the results and don't need a data set Proc Report or Tabulate will make summaries as well.

Kurt_Bremser · Posted 08-26-2021 04:37 AM

That DISTINCT is, as noted, redundant, and can be a real performance killer when working with large datasets.

BIG hint for the future: only use code that is needed (which implies that you have to know the function of each keyword used when working with code. Knowledge is Power.)

If the PROC SUMMARY with CLASS cracks your memory limitations (may happen in multi-user environments, where memory must be limited to protect users from each other's code, and with a high cardinality of name), sort first and use BY:

proc sort data=have;
by name;
run;

proc summary data=have;
by name;
var amount hours rate;
output out=test (drop=_:) sum()=;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PROC SQL: default column names?

Re: PROC SQL: default column names?

Re: PROC SQL: default column names?

Re: PROC SQL: default column names?

PROC SQL: default column names?

Re: PROC SQL: default column names?

Re: PROC SQL: default column names?

Re: PROC SQL: default column names?

Click image to register for webinar

Classroom Training Available!