BookmarkSubscribeRSS Feed
gtlightfoot
Fluorite | Level 6

I have a list of around 50 SAS table names within a SAS data set that I would like to query and summarize without typing out 50 different "from" or "set" statements. Is there a way to do this programmatically given the example below? The table names do not necessarily follow a consistent pattern unfortunately. I am using SAS EG 7.15. Thank you for any ideas you may have!

 

*Example data sets I would like to summarize separately;
data TABLE_201901_V3;
input store product $ units;
datalines;
1 A 10
2 A 20
3 A 20
;

data TABLE_201812_V1;
input store product $ units;
datalines;
1 A 20
2 A 30
3 A 22
;

data TABLE_201811_V2;
input store product $ units;
datalines;
1 A 30
2 A 40
3 A 23
;

data TABLE_201709_V1;
input store product $ units;
datalines;
1 A 40
2 A 50
3 A 25
;

data TABLE_201603_V1;
input store product $ units;
datalines;
1 A 50
2 A 60
3 A 26
;

*List of the above table names;
data sas_table_names;
informat tbl_name $char15.;
format tbl_name $char15.;
input tbl_name $;
datalines;
TABLE_201901_V3
TABLE_201812_V1
TABLE_201811_V2
TABLE_201709_V1
TABLE_201603_V1
;

*Example of summary information I would like for each table:
	Where i = a value to differentiate five new data sets
		and &table_reference = each of the five datalines input data sets;
proc sql;
create table summary_&i as
select distinct
	product,
	average(units)
from &table_reference
group by product;
quit;
6 REPLIES 6
novinosrin
Tourmaline | Level 20
*Example data sets I would like to summarize separately;
data TABLE_201901_V3;
input store product $ units;
datalines;
1 A 10
2 A 20
3 A 20
;

data TABLE_201812_V1;
input store product $ units;
datalines;
1 A 20
2 A 30
3 A 22
;

data TABLE_201811_V2;
input store product $ units;
datalines;
1 A 30
2 A 40
3 A 23
;

data TABLE_201709_V1;
input store product $ units;
datalines;
1 A 40
2 A 50
3 A 25
;

data TABLE_201603_V1;
input store product $ units;
datalines;
1 A 50
2 A 60
3 A 26
;

*List of the above table names;
data sas_table_names;
informat tbl_name $char15.;
format tbl_name $char15.;
input tbl_name $;
datalines;
TABLE_201901_V3
TABLE_201812_V1
TABLE_201811_V2
TABLE_201709_V1
TABLE_201603_V1
;


data _null_;
 set sas_table_names end=z;
 if _n_=1 then call execute('data temp;set ');
 call execute(' '||tbl_name||' ');
 if z then call execute('indsname=name;n=name;run;');
run;

data want;
do _n_=1 by 1 until(last.product);
 set temp;
 by n product notsorted;
 sum=sum(units,sum);
end;
avg=sum/_n_;
drop sum n;
run;
PaigeMiller
Diamond | Level 26

Very nice @novinosrin 

 

I would make the last step

 

proc summary data=temp nway;
    class n product;
    var units;
    output out=stats mean=mean_units;
run;

While there really isn't much difference if you only want the means, as soon as you want other statistics (like minimum, maximum, standard deviations or medians or other percentiles — or heaven forbid you want all of those statistics), its less programming in PROC SUMMARY than doing this in a data step.

 

Also, the data step solution fails to produce the proper answer if there are missing values in variable UNITS. Opposite is true for PROC SUMMARY.

--
Paige Miller
hashman
Ammonite | Level 13

@gtlightfoot:

I see no reason to create 5 different summary tables. Just add table_reference as a group variable to a concatenated data set and run the summary against it. If you then want the summary for a particular input table, just subset the overall summary by table_reference:

proc sql noprint ;                                                 
  select tbl_name into :dsn separated by " " from sas_table_names ;
quit ;                                                             
                                                                   
data v / view = v ;                                                
  set &dsn indsname = _t ;                                         
  table_reference = _t ;                                           
run ;                                                              
                                                                   
proc sql ;                                                         
  create table summary as                                          
  select table_reference, product, avg (units) as average          
  from   v                                                         
  group  1, 2                                                      
  ;                                                                
quit ;                                                             

Kind regards

Paul D. 

ballardw
Super User

Also don't see an actual need for creating extra data sets. I might also consider a different reporting approach:

 

proc tabulate data=work.temp;
   class name product;
   var units;
   table name,
         product,
         units*mean=''
         ;
run;

Which allows a number of possible changes with the options from Proc tabulate that you can't do in sql, such as multilabel formats for the product class.

novinosrin
Tourmaline | Level 20

Guru @hashman  Dedicating a special piece of my learning to you(from you) and  to my late father

 

data want ;
 set sas_table_names;
 length store 8 product $8 units 8;
   dcl hash H (dataset:tbl_name,ordered: "A",multidata:'y') ;
   h.definekey  ("product") ;
   h.definedata (all:"y") ;
   h.definedone () ;
   dcl hiter hi('h');
 _rc=hi.first();
  do _n_=1 by 1 while(_rc=0);
   _p=lag(product);
   _sum+units;
   if units>. then _n+1;
   avg=_sum/_n;
   if _p ne product and _p>' ' or _n_=h.num_items then do;
   output;
   call missing(of _:);
   end;
  _rc=hi.next();
  end;
 call missing(of _:,store,product,units);
 h.clear();
 drop _: units store;
 run;

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1626 views
  • 4 likes
  • 5 in conversation