Every month we run the following to create data files:
%macro getdata;
%let year1 = 2000;
%let year2 = 2011;
%let year3 = 2012;
%let year4 = 2017;
%do i = &year1 %to &year2;
data egtask.dailysup_&i.;set ibs_arc.dsup&i.;run;
%end;
%do i = &year3 %to &year4;
data egtask.dailysup_&i.;set ibs.dsup&i.;run;
%end;
%mend getdata;
%getdata;
PROC SQL;
CREATE TABLE SASCOMB.DAILY_SUP_APPEND AS
SELECT * FROM EGTASK.DAILYSUP_2000
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2001
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2002
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2003
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2004
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2005
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2006
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2007
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2008
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2009
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2010
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2011
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2012
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2013
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2014
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2015
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2016
OUTER UNION CORR
SELECT * FROM EGTASK.DAILYSUP_2017
;
run;
As you can see, the sql command is not very elegant. Is it possible to put it into a loop so one would just be able to use the year variables at the start of the macro to make the table? This way we would not need to add a new select * line at the end of every year; it would just be fully automated.
Thanks!
Based on @Astounding's and @Reeza's notes:
Assuming archieve data desn't change, then create base - run it once - as:
data egtask.dsup2000_2011;
set ibs_arc.dsup2000 - ibs_arc.dsuo2011;
run;
After successfull run you can deside if to cancel it from archieve or keep for backup;
having the base created, you can continue monthly with next code, either:
data SASCOMB.DAILY_SUP_APPEND;
set egtask.dsup2000_2011 /* the base file */
ibs.dsup2012 - ibs.dsup2017
;
run;
or using my fixed code addapted to above:
%macro join(ystart,yend);
%DO;
PROC SQL;
CREATE TABLE SASCOMB.DAILY_SUP_APPEND AS
select * from egtask.dsup2000_2011
%do year=&ystart %to ¥d;
outer union corr
SELECT * FROM EGTASK.DAILYSUP_&year
%end;
; QUIT;
%END;
%mend join;
%join(2012,2017);
Assuming your code works fine, you can replace the SQL step with next code:
%macro join(ystart,yend);
%DO;
PROC SQL;
CREATE TABLE SASCOMB.DAILY_SUP_APPEND AS
%do year=&ystart %to ¥d;
SELECT * FROM EGTASK.DAILYSUP_&year
OUTER UNION CORR
%end;
; QUIT;
%END;
%mend join;
%join(2000,2017);
Wouldn't PROC APPEND be more efficient, even if you did have to do it monthly?
Potentially, there is a lot wrong with this process. Making the code more elegant by setting up a loop? No way. That would just hide the bad features that are already built in so that they would never get changed. Let's take a look under the hood here ...
Does the underlying data for years 2000 to 2016 ever change? If not, why not just save the combined result through 2016 permanently. Then you can add in the data for 2017 monthly, to create an up-to-date picture.
Why copy data sets to the EGTASK library? That's a waste of resources. SQL can operate on the original data from the IBS and IBS_ARC libraries.
Trying to add macro loops to cover up the blemishes? Not the right solution.
Before we get to any macro language, picture this code as a goal:
PROC SQL;
CREATE TABLE SASCOMB.DAILY_SUP_APPEND AS
SELECT * FROM ibs_arc.DSUP_2000
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2001
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2002
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2003
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2004
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2005
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2006
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2007
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2008
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2009
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2010
OUTER UNION CORR
SELECT * FROM ibs_arc.DSUP2011
OUTER UNION CORR
SELECT * FROM ibs.DSUP2012
OUTER UNION CORR
SELECT * FROM ibs.DSUP2013
OUTER UNION CORR
SELECT * FROM ibs.DSUP2014
OUTER UNION CORR
SELECT * FROM ibs.DSUP2015
OUTER UNION CORR
SELECT * FROM ibs.DSUP2016
OUTER UNION CORR
SELECT * FROM ibs.DSUP2017
;
run;
Copying the data sets from their original locations almost certainly represents an extra unnecessary step (unless the EGTASK folder also needs to contain a separate copy of the data).
Could the SQL code be replaced with something better? I'm not sure. I don't know enough about how OUTER UNION CORR works to combine files. If that's the right way to go, Schmuel's idea is close. It does generate the words OUTER UNION CORR an extra time, but that can be addressed by macro language.
BCNAV,
Now you're looking in the right direction ... what's the process, what does it need to be, what could it become?
Question: When you say the data for past years could change, does that include the earlier archived years (2000 to 2011)? If those years are fixed, they could be combined and saved that way to speed up the monthly process.
Question for the board: What does OUTER UNION CORR do? Is it identical to concatenating the data sources in a SET statement, or does it imply some other result? Don't make me message @PGStats
SQL - OUTER UNION CORR -
Next documentation taken from:
https://v8doc.sas.com/sashtml/proc/zueryexp.htm
CORRESPONDING (CORR) Keyword The CORRESPONDING keyword is used only when a set operator is specified.
CORR causes PROC SQL to match the columns in table-expressions by name and not by ordinal position.
Columns that do not match by name are excluded from the result table, except for the OUTER UNION operator. See OUTER UNION . For example, when performing a set operation on two table-expressions,
PROC SQL matches the first specified column-name (listed in the SELECT clause)
from one table-expression with the first specified column-name from the other.
If CORR is omitted, PROC SQL matches the columns by ordinal position.
Here's a one way that's damn easy to code but may not be the most efficient 🙂
This assumes you're only appending data and would replace your entire program.
%let year1 = 2000;
%let year2 = 2011;
%let year3 = 2012;
%let year4 = 2017;
data want;
set
ibs_arc.dsup&year1. - ibs_arc.dsup&year2.
ibs_arc.dsup&year3. - ibs_arc.dsup&year4. ;
run;
The reason they are copied to EGTASK, is that IT has some of the data as "current" (last few years), while older stuff is in an archive folder. This way the data to append is all in one spot. I do like Shmuel's idea and I can see it work, but it could also be nicer not to copy the data locally to EGTASK, as Astounding notes. Not sure it is worth the effort, but could his idea be modified to get data from the two sources? Maybe not worth the effort.
thx
Your archive data isn't changing is it? You can make that your 'base' table and append to it once, you shouldn't combine them all each time.
Based on @Astounding's and @Reeza's notes:
Assuming archieve data desn't change, then create base - run it once - as:
data egtask.dsup2000_2011;
set ibs_arc.dsup2000 - ibs_arc.dsuo2011;
run;
After successfull run you can deside if to cancel it from archieve or keep for backup;
having the base created, you can continue monthly with next code, either:
data SASCOMB.DAILY_SUP_APPEND;
set egtask.dsup2000_2011 /* the base file */
ibs.dsup2012 - ibs.dsup2017
;
run;
or using my fixed code addapted to above:
%macro join(ystart,yend);
%DO;
PROC SQL;
CREATE TABLE SASCOMB.DAILY_SUP_APPEND AS
select * from egtask.dsup2000_2011
%do year=&ystart %to ¥d;
outer union corr
SELECT * FROM EGTASK.DAILYSUP_&year
%end;
; QUIT;
%END;
%mend join;
%join(2012,2017);
On 2nd thought - why copy archive data - maybe creating the base as a view will be more efficient:
data egtask.dsup2000_2011 / view=egtask.dsup2000_2011;
set ibs_arc.dsup2000 - ibs_arc.dsuo2011;
run;
In an effort to keep things simple for others, I decided to use the simpler code by Shmuel (despite my liking of macros and loops). Everyone can understand it, is intuitive, and can easily be changed. Thanks to all!!
/* Use a view as a "virtual dataset" so that data is not compiled unnecessarily */
data egtask.dsup2000_2011 / view=egtask.dsup2000_2011;
set ibs_arc.dsup2000 - ibs_arc.dsup2011;
run;
data SASCOMB.DAILY_SUP_APPEND_2;
set egtask.dsup2000_2011 /* The base file view from above*/
ibs.dsup2012 - ibs.dsup2017
;
run;
proc sql; drop view egtask.dsup2000_2011; run;
d
@BCNAV Please mark one of his answers as the solution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.