I have a list of around 50 SAS table names within a SAS data set that I would like to query and summarize without typing out 50 different "from" or "set" statements. Is there a way to do this programmatically given the example below? The table names do not necessarily follow a consistent pattern unfortunately. I am using SAS EG 7.15. Thank you for any ideas you may have!
*Example data sets I would like to summarize separately;
data TABLE_201901_V3;
input store product $ units;
datalines;
1 A 10
2 A 20
3 A 20
;
data TABLE_201812_V1;
input store product $ units;
datalines;
1 A 20
2 A 30
3 A 22
;
data TABLE_201811_V2;
input store product $ units;
datalines;
1 A 30
2 A 40
3 A 23
;
data TABLE_201709_V1;
input store product $ units;
datalines;
1 A 40
2 A 50
3 A 25
;
data TABLE_201603_V1;
input store product $ units;
datalines;
1 A 50
2 A 60
3 A 26
;
*List of the above table names;
data sas_table_names;
informat tbl_name $char15.;
format tbl_name $char15.;
input tbl_name $;
datalines;
TABLE_201901_V3
TABLE_201812_V1
TABLE_201811_V2
TABLE_201709_V1
TABLE_201603_V1
;
*Example of summary information I would like for each table:
Where i = a value to differentiate five new data sets
and &table_reference = each of the five datalines input data sets;
proc sql;
create table summary_&i as
select distinct
product,
average(units)
from &table_reference
group by product;
quit;
*Example data sets I would like to summarize separately;
data TABLE_201901_V3;
input store product $ units;
datalines;
1 A 10
2 A 20
3 A 20
;
data TABLE_201812_V1;
input store product $ units;
datalines;
1 A 20
2 A 30
3 A 22
;
data TABLE_201811_V2;
input store product $ units;
datalines;
1 A 30
2 A 40
3 A 23
;
data TABLE_201709_V1;
input store product $ units;
datalines;
1 A 40
2 A 50
3 A 25
;
data TABLE_201603_V1;
input store product $ units;
datalines;
1 A 50
2 A 60
3 A 26
;
*List of the above table names;
data sas_table_names;
informat tbl_name $char15.;
format tbl_name $char15.;
input tbl_name $;
datalines;
TABLE_201901_V3
TABLE_201812_V1
TABLE_201811_V2
TABLE_201709_V1
TABLE_201603_V1
;
data _null_;
set sas_table_names end=z;
if _n_=1 then call execute('data temp;set ');
call execute(' '||tbl_name||' ');
if z then call execute('indsname=name;n=name;run;');
run;
data want;
do _n_=1 by 1 until(last.product);
set temp;
by n product notsorted;
sum=sum(units,sum);
end;
avg=sum/_n_;
drop sum n;
run;
Very nice @novinosrin
I would make the last step
proc summary data=temp nway;
class n product;
var units;
output out=stats mean=mean_units;
run;
While there really isn't much difference if you only want the means, as soon as you want other statistics (like minimum, maximum, standard deviations or medians or other percentiles — or heaven forbid you want all of those statistics), its less programming in PROC SUMMARY than doing this in a data step.
Also, the data step solution fails to produce the proper answer if there are missing values in variable UNITS. Opposite is true for PROC SUMMARY.
I see no reason to create 5 different summary tables. Just add table_reference as a group variable to a concatenated data set and run the summary against it. If you then want the summary for a particular input table, just subset the overall summary by table_reference:
proc sql noprint ;
select tbl_name into :dsn separated by " " from sas_table_names ;
quit ;
data v / view = v ;
set &dsn indsname = _t ;
table_reference = _t ;
run ;
proc sql ;
create table summary as
select table_reference, product, avg (units) as average
from v
group 1, 2
;
quit ;
Kind regards
Paul D.
Also don't see an actual need for creating extra data sets. I might also consider a different reporting approach:
proc tabulate data=work.temp; class name product; var units; table name, product, units*mean='' ; run;
Which allows a number of possible changes with the options from Proc tabulate that you can't do in sql, such as multilabel formats for the product class.
Guru @hashman Dedicating a special piece of my learning to you(from you) and to my late father
data want ;
set sas_table_names;
length store 8 product $8 units 8;
dcl hash H (dataset:tbl_name,ordered: "A",multidata:'y') ;
h.definekey ("product") ;
h.definedata (all:"y") ;
h.definedone () ;
dcl hiter hi('h');
_rc=hi.first();
do _n_=1 by 1 while(_rc=0);
_p=lag(product);
_sum+units;
if units>. then _n+1;
avg=_sum/_n;
if _p ne product and _p>' ' or _n_=h.num_items then do;
output;
call missing(of _:);
end;
_rc=hi.next();
end;
call missing(of _:,store,product,units);
h.clear();
drop _: units store;
run;
Thanks, @novinosrin.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.