Hi folks,
I have more than 50 tables with identical structure (they all have the same number of columns and same datatype). I would like to do the following code which returns 1 line of data, but for all my 50+ tables. Then, I would also like to regroup all those lines into one dataset.
PROC SQL;
CREATE TABLE DASHBOARD2 AS
( SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS
FROM TEST01
);
QUIT;
Is there a way to do so for all the 50+ tables and to get one dataset with 50+ lines with thoses informations (avg, max, min, sum, nbos)?
@jpprovost wrote:
Hi folks,
I have more than 50 tables with identical structure (they all have the same number of columns and same datatype). I would like to do the following code which returns 1 line of data, but for all my 50+ tables. Then, I would also like to regroup all those lines into one dataset.
PROC SQL; CREATE TABLE DASHBOARD2 AS ( SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS FROM TEST01 ); QUIT;
Is there a way to do so for all the 50+ tables and to get one dataset with 50+ lines with thoses informations (avg, max, min, sum, nbos)?
@novinosrin is correct...
Your proc sql is going to return one line, with avg, max, min, sum, and count for all the rows in test01.
Get proc summary to create the same results for your single test01 dataset. Read the doc on proc summary if you need to.
Now that it's working for one dataset, just append all your datasets into one, using the indsname option to capture the contributing dataset name. You could even use a data step view if your datasets are really large. Read the doc on the indsname option if you need to.
Then, just use the dataset name as a class variable in proc summary. This is analogous to group by in SQL.
If you insist on proc sql, then append the datasets as above, and use the dataset name in a group by in proc sql:
data test01 (where=(x between 0 and 10))
test02 (where=(x between 10 and 20))
test03 (where=(x between 20 and 30))
;
do x=0 to 30;
output;
end;
run;
data append / view=append;
* or test: if you don't have any extra files named test... ;
set test01-test03 indsname=n;
name=n;
run;
proc sql;
create table foo as
select
name
,avg(x) as avg
,min(x) as min
,max(x) as max
from
append
group by
name
;
quit;
proc summary data=append nway noprint;
class name;
var x;
output out=foo2 (drop=_type_ _freq_)
mean=avg
min=min
max=max
;
run;
Macros are great - I use them every day. But it's just as important to know when NOT to use a macro.
HI @jpprovost I think
1. Append 50 tables to 1
2. Data step/proc summary (By groups)
should be a better approach than proc sql(which warrants 48 union all, or a macro loop one by one)
@novinosrin wrote:
HI @jpprovost I think
1. Append 50 tables to 1
2. Data step/proc summary (By groups)
should be a better approach than proc sql(which warrants 48 union all, or a macro loop one by one)
Hi @novinosrin ,
I did the following macro :
%macro dash;
%do i = 1 %to 3;
PROC SQL;
CREATE TABLE DASHBOARDk&i AS
( SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS
FROM TEST0&i
);
QUIT;
run ;
%end;
%mend dash;
It extracts the wanted informations into 50+ different tables. Now, I struggle to get them all together.
How can I use your proposed solution? I'm new with SAS and I'm trying to "Google" as many things as I can.
Thanks!
data append;
set test1-test50 indsname=n;/*if it is in continuous sequence suffix*/
nameof_dsn=n;
run;
proc summary data=append nway noprint;
class nameof_dsn;
var value;
output out=want mean=moyenne max=maximum; /*and so forth, i am lazy to write the rest*/
run;
Edited: to include nameof_dsn as Class variable
I think this can be done quite simply by using an INSERT statement in SQL:
%macro dash;
PROC SQL;
CREATE TABLE DASHBOARD AS
SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS
FROM TEST01;
;
QUIT;
%do i = 2 %to 3;
PROC SQL;
Insert into dashboard
SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS
FROM TEST0&i
);
QUIT;
run ;
%end;
%mend dash;
I assume that each table have 1 or more observations.
When you ask for max(value) - or any other statistics - do you mean per table or per all tables as a unit ?
Do you want the statistics of one specific variable or on all variables ?
It seems to me that PROC MEANS will be more efficient than PROC SQL.
@jpprovost wrote:
In fact, my 50+ tables represents some the 50+ last months. Of course, I have more observations each month the file is generated. What I want is to define limits from the past 50+ months usings statistics of those months to verify that my new data (next months) are within those limits.
So the answer to your question is the a specific statistic on one variable (value).
If it's not clear, do not hesitate.
In such case the code should look like:
proc datasets lib=work nolist; delete final; quit;
%macro stat;
%do i=1 %to 50;
proc means data=test&i noprint nway;
class <month>;
var value;
output out=temp mean=moyenne max=maximum
min=minimum sum=somme n=nobs;
run;
proc append base=final data=temp; run;
proc datasets lib=work nolist; delete temp; quit;
%mend;
%stat;
@Shmuel wrote:
@jpprovost wrote:
In fact, my 50+ tables represents some the 50+ last months. Of course, I have more observations each month the file is generated. What I want is to define limits from the past 50+ months usings statistics of those months to verify that my new data (next months) are within those limits.
So the answer to your question is the a specific statistic on one variable (value).
If it's not clear, do not hesitate.In such case the code should look like:
proc datasets lib=work nolist; delete final; quit; %macro stat; %do i=1 %to 50; proc means data=test&i noprint nway; class <month>; var value; output out=temp mean=moyenne max=maximum min=minimum sum=somme n=nobs; run; proc append base=final data=temp; run; proc datasets lib=work nolist; delete temp; quit; %mend; %stat;
Thank you @Shmuel for your help and time.
Really appreciated.
@jpprovost wrote:
Hi folks,
I have more than 50 tables with identical structure (they all have the same number of columns and same datatype). I would like to do the following code which returns 1 line of data, but for all my 50+ tables. Then, I would also like to regroup all those lines into one dataset.
PROC SQL; CREATE TABLE DASHBOARD2 AS ( SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS FROM TEST01 ); QUIT;
Is there a way to do so for all the 50+ tables and to get one dataset with 50+ lines with thoses informations (avg, max, min, sum, nbos)?
@novinosrin is correct...
Your proc sql is going to return one line, with avg, max, min, sum, and count for all the rows in test01.
Get proc summary to create the same results for your single test01 dataset. Read the doc on proc summary if you need to.
Now that it's working for one dataset, just append all your datasets into one, using the indsname option to capture the contributing dataset name. You could even use a data step view if your datasets are really large. Read the doc on the indsname option if you need to.
Then, just use the dataset name as a class variable in proc summary. This is analogous to group by in SQL.
If you insist on proc sql, then append the datasets as above, and use the dataset name in a group by in proc sql:
data test01 (where=(x between 0 and 10))
test02 (where=(x between 10 and 20))
test03 (where=(x between 20 and 30))
;
do x=0 to 30;
output;
end;
run;
data append / view=append;
* or test: if you don't have any extra files named test... ;
set test01-test03 indsname=n;
name=n;
run;
proc sql;
create table foo as
select
name
,avg(x) as avg
,min(x) as min
,max(x) as max
from
append
group by
name
;
quit;
proc summary data=append nway noprint;
class name;
var x;
output out=foo2 (drop=_type_ _freq_)
mean=avg
min=min
max=max
;
run;
Macros are great - I use them every day. But it's just as important to know when NOT to use a macro.
@ScottBass wrote:
@jpprovost wrote:
Hi folks,
I have more than 50 tables with identical structure (they all have the same number of columns and same datatype). I would like to do the following code which returns 1 line of data, but for all my 50+ tables. Then, I would also like to regroup all those lines into one dataset.
PROC SQL; CREATE TABLE DASHBOARD2 AS ( SELECT AVG(VALUE) AS MOYENNE, MAX(VALUE) AS MAXIMUM, MIN(VALUE) AS MINIMUM, SUM(VALUE) AS SOMME, COUNT(*) AS NBOBS FROM TEST01 ); QUIT;
Is there a way to do so for all the 50+ tables and to get one dataset with 50+ lines with thoses informations (avg, max, min, sum, nbos)?
@novinosrin is correct...
Your proc sql is going to return one line, with avg, max, min, sum, and count for all the rows in test01.
Get proc summary to create the same results for your single test01 dataset. Read the doc on proc summary if you need to.
Now that it's working for one dataset, just append all your datasets into one, using the indsname option to capture the contributing dataset name. You could even use a data step view if your datasets are really large. Read the doc on the indsname option if you need to.
Then, just use the dataset name as a class variable in proc summary. This is analogous to group by in SQL.
If you insist on proc sql, then append the datasets as above, and use the dataset name in a group by in proc sql:
data test01 (where=(x between 0 and 10)) test02 (where=(x between 10 and 20)) test03 (where=(x between 20 and 30)) ; do x=0 to 30; output; end; run; data append / view=append; * or test: if you don't have any extra files named test... ; set test01-test03 indsname=n; name=n; run; proc sql; create table foo as select name ,avg(x) as avg ,min(x) as min ,max(x) as max from append group by name ; quit; proc summary data=append nway noprint; class name; var x; output out=foo2 (drop=_type_ _freq_) mean=avg min=min max=max ; run;
Macros are great - I use them every day. But it's just as important to know when NOT to use a macro.
Thank to you @ScottBass.
Of course macros are great, I try to use is as much as I can.
However, do you have any tips/hints of when macros should NOT be used?
Regards,
Of course macros are great, I try to use is as much as I can.
No! I disagree. I would change this to "I try to use only when required". So, what is the definition of "required"? Obviously you'll get different answers, my quick one off the top of my head is 1) when you want to parameterize a block of working code (if you find yourself cut-and-pasting a block of code with the same edits, that's a good candidate), or 2) you want to encapsulate a block of code that you will call often (this could also be an external %include block of code).
However, do you have any tips/hints of when macros should NOT be used?
Macro is analogous to the pre-processor in C. It does a lot more, but let's run with this analogy. All it does is control the code that gets sent to the compiler (for now I'm ignoring data step interfaces like symget and call symputx).
I like to think of macro as "If I could type this code at 10,000 words a second, and never make a mistake, then I wouldn't need macro". Silly perhaps, but macro is just controlling the code that gets sent to the compiler (or a proc which isn't compiled, but I digress).
Also, macro is MUCH slower than if you could do it in the data step or a proc. Another example is if you're using macro to process in groups, when that processing would be supported via BY group processing in SAS.
If you find you're writing really klunky, hard to follow macro code for your problem, step back and see if you really need macro.
A quick example and I've got to run:
Say you have 1000 external files you need to concatenate into a single dataset. They are all in one directory and have the same format.
You could get the list of files, then loop over that list in a macro, generating 1000 data steps, appending each input to a target table.
Or, you could use an aggregated filename, plus the filename and/or filevar options to the infile statement.
I'd say the key to knowing when not to use macro is simply increasing your overall knowledge of alternative, better approaches via SAS, esp. the data step.
I can usually tell a "newbie" (I'm not saying you're one, by the way) when they think macro is the panacea for everything.
HTH...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.