Solved: How to make a summary for many columns and name the summary variables ...

duanzongran · Posted 08-26-2024 10:30 PM

dear all:

I want to summary a sales data from different area ( for example sum_20240105 summary the s_20240105 and the prefix 'sum_' is a must.There are lots of columns)，how can I code it not that hard?

data sales;
input area $ s_20240105 s_20240109 s_20240112 s_20240122 s_20240129 s_20240209 s_20240212 s_20240222;
datalines;
NO1 1 2 4 5 6 7 8 9
NO2 2 5 9 5 8 6 8 10
NO3 2 4 6 8 7 4 9 12
;
run;

data want;
set sales end=last;
sum_20240105+s_20240105;
sum_20240109+s_20240109;
sum_20240112+s_20240112;
sum_20240122+s_20240122;
sum_20240129+s_20240129;
sum_20240209+s_20240209;
sum_20240212+s_20240212;
sum_20240222+s_20240222;
if last then output;
keep  sum_:;
run;

The want data is:

mkeintz · Posted 08-26-2024 11:45 PM

Use proc summary, with a statistic-rename capability. For instance, if you have only two variables you could:

proc summary data=sales;
  var s_20240105 s_20240109;
  output out=want   sum(s_20240105 s_20240109) = SUM_20240105 SUM_20240109;
run;

But you have a lot of variables to be renamed. Use the dictionary.columns capability of PROC SQL to build macrovars &VARLIST and &SUMLIST to generate the rename components:

data sales;
input area $ s_20240105 s_20240109 s_20240112 s_20240122 s_20240129 s_20240209 s_20240212 s_20240222;
datalines;
NO1 1 2 4 5 6 7 8 9
NO2 2 5 9 5 8 6 8 10
NO3 2 4 6 8 7 4 9 12
run;

proc sql noprint;
  select distinct 
         name                       ,cats('SUM_',scan(name,2,'_'))
  into   :varlist separated by ' '  ,:sumlist separated by ' '
  from dictionary.columns
  where libname='WORK' and memname='SALES' and upcase(scan(name,1,'_'))='S';
quit;
%put &=varlist;
%put &=sumlist;

proc summary data=sales;
  var s_:;
  output out=want (drop=_type_ _freq_) sum(&varlist)=&sumlist;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Tom · Posted 08-26-2024 11:13 PM

Just ask PROC SUMMARY to do that.

proc summary data=sales nway ;
   var s_: ;
   output out=want sum=;
run;

Will produce a dataset like you asked for (only using the original variable names).

Does it really matter if the names start with SUM instead of S? Why?

It will probably be MUCH easier if you move that numeric suffix (that looks like a DATE string) out of the variable NAME and into its own variable.

proc transpose data=sales out=sales_t(rename=(col1=sales)) name=date_char ;
  by area ;
  var s_: ;
run;

data sales_t;
  set sales_t;
  date = input(substr(date_char,3),yymmdd8.);
  format date yymmdd10.;
run;

proc summary data=sales_t nway;
  class date;
  var sales ;
  output out=want_t sum=sum_sales;
run;

Results

                                          sum_
Obs          date    _TYPE_    _FREQ_    sales

 1     2024-01-05       1         3         5
 2     2024-01-09       1         3        11
 3     2024-01-12       1         3        19
 4     2024-01-22       1         3        18
 5     2024-01-29       1         3        21
 6     2024-02-09       1         3        17
 7     2024-02-12       1         3        25
 8     2024-02-22       1         3        31

duanzongran · Posted 08-27-2024 02:06 AM

Thank you @Tom .

mkeintz · Posted 08-26-2024 11:45 PM

Use proc summary, with a statistic-rename capability. For instance, if you have only two variables you could:

proc summary data=sales;
  var s_20240105 s_20240109;
  output out=want   sum(s_20240105 s_20240109) = SUM_20240105 SUM_20240109;
run;

But you have a lot of variables to be renamed. Use the dictionary.columns capability of PROC SQL to build macrovars &VARLIST and &SUMLIST to generate the rename components:

data sales;
input area $ s_20240105 s_20240109 s_20240112 s_20240122 s_20240129 s_20240209 s_20240212 s_20240222;
datalines;
NO1 1 2 4 5 6 7 8 9
NO2 2 5 9 5 8 6 8 10
NO3 2 4 6 8 7 4 9 12
run;

proc sql noprint;
  select distinct 
         name                       ,cats('SUM_',scan(name,2,'_'))
  into   :varlist separated by ' '  ,:sumlist separated by ' '
  from dictionary.columns
  where libname='WORK' and memname='SALES' and upcase(scan(name,1,'_'))='S';
quit;
%put &=varlist;
%put &=sumlist;

proc summary data=sales;
  var s_:;
  output out=want (drop=_type_ _freq_) sum(&varlist)=&sumlist;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Re: How to make a summary for many columns and name the summary variables elegantly?

Register Today!

SAS Training: Just a Click Away