topic Re: PROC SQL: default column names? in SAS Programming

PROC SQL: default column names?

kchoi78 — Thu, 26 Aug 2021 00:01:48 GMT

When I aggregate a dataset via proc sql, I have to state the column name via "as" statement else the column will return as something like "TEMP005." There are so many cases where I just want the new aggregated column to have the same name as the var that I am aggregating. For example, say I have a table with four variables: name, amount, hours, and rate, and I am aggregating columns amount, hours, and rate.

proc sql;

create table test as

select distinct name, sum(amount) as amount, sum(hours) as hours, sum(rate) as rate

from have

group by name

;

quit;

Is there any way to not have to write the "as" statement every single time and have the column names just default to whatever I am aggregating?

Re: PROC SQL: default column names?

SASKiwi — Thu, 26 Aug 2021 00:20:08 GMT

I don't think there is any PROC SQL option for this but using macro could help:

%macro SQL_Sum (group = , var1 = , var2 = ,var3 = );

proc sql;
  create table test as 
  select &group, sum(&var1) as &var1, sum(&var2) as &var2, sum(&var3) as &var3
  from have
  group by &group
;
quit;

%mend SQL_Sum;

%SQL_Sum (group = name, var1 = amount, var2 = hours ,var3 = rate);

BTW using DISTINCT is redundant when it is in a GROUP BY.

Re: PROC SQL: default column names?

ballardw — Thu, 26 Aug 2021 00:32:35 GMT

Sometimes you may want to consider a different procedure.

Proc summary data=have nway;
   class name;
  var amount hours rate;
  output out= test (drop=_:) sum = ;
run;

Proc Means/ Summary also has an Autoname feature that will append the requested statistic(s) names to the variable such as:

Proc summary data=have nway;
   class name;
  var amount hours rate;
  output out= test (drop=_:) sum = max= min= std= / autoname;
run;

Which will create amount_sum, amount_max, amount_min, amount_std (hope you get the picture).

The NWAY on the Proc statement suppresses multiple levels of combinations of the class variables. Otherwise you get one row that has the statistic overall (and assorted combinations if more than one class variable is used). The drop= drops two variables that are automatically supplied: _freq_ how many observations used and _type_ which indicates which specific combination of class variables are represented for the observation in the output data set.

You can also any of the forms of variable lists in Proc Means/Summary so you could get the summaries for all of the numeric variables by using: var _numeric_; instead of listing all the variable names.

If you just want to see the results and don't need a data set Proc Report or Tabulate will make summaries as well.

Re: PROC SQL: default column names?

Kurt_Bremser — Thu, 26 Aug 2021 08:37:41 GMT

That DISTINCT is, as noted, redundant, and can be a real performance killer when working with large datasets.

BIG hint for the future: only use code that is needed (which implies that you have to know the function of each keyword used when working with code. Knowledge is Power.)

If the PROC SUMMARY with CLASS cracks your memory limitations (may happen in multi-user environments, where memory must be limited to protect users from each other's code, and with a high cardinality of name), sort first and use BY:

proc sort data=have;
by name;
run;

proc summary data=have;
by name;
var amount hours rate;
output out=test (drop=_:) sum()=;
run;