I have a background system that is serving up monthly table to our analytics function. Each table is named "name_202001", "name_202002" etc. Where 2020 is the Year and 01 is the end date of the month number of the tax year. So 01 is 30th of April, 02 31st of June etc.
I am using Enterprise Guide and would like a program that appends these tables based on a start and end date Parameter.
There's a kicker, there always is. These tables contain a huge huge number of references, and I'm aiming to be able to combine around 60 tables into one. I have another table that contains a small list of references that is just the subset that I want
So I need the program to filter as it goes, rather than combining all 60 tables and then filtering them afterwards.
Does that make sense? Anyone know how it's done?
Thanks!
I guess that by "references" you mean some kind of key variable, for which you have a lookup dataset?
If that is the case, first build the list of dataset names and store them into a macro variable:
data _null_;
length datasets $32767; * max length;
date = &start_date.;
do while (date le &end_date.);
datasets = catx(" ",datasets,"name_" !! put(date,yymmn6.));
date = intnx('month',date,1,'s');
end;
call symputx('datasets',datasets);
run;
Then, combine and subset in one step:
data want;
set &datasets.;
if _n_ = 1
then do;
declare hash l (dataset:"lookup");
l.definekey("key");
l.definedone();
end;
if l.check() = 0;
run;
(assuming that your "reference" variable is called "key")
Assuming that the 2020 is the year, do intend to combine across years?
If no, then you can use a data step such as:
data want;
set name_202001 - name_202005;
run;
to append sequentially numbered names. However if you need to cross years such as name_201911 to name_202003 that approach doesn't work because the list parser SAS uses requires sequential numbering.
I think it all depends on how this referenced table is structured. Can you turn that data into macro variables that can be used to subset the data in a data step?
%let cnt=8; /*number of datasets you need to combine. Assuming 1 row per table in reference table, should equal row count of reference table*/
%macro dataset_append();
%do i=1 %to &cnt.;
/*Create variables to be used to go table by table and subset according to criteria*/
data _null_;
set reference_table (firstobs=&i. obs=&i);
call symputx('table_name',variable_with_table_name);
call symputx('criteria1', variable_with_criteria1);
call symputx('criteria2', variable_with_criteria2);
run;
/*create temp "want" table that is single month table subsetted based on criteria from reference table*/
data want;
set &table_name.;
where var1=&criteria1. and var2=&criteria2.;
run;
/*append to your combined dataset*/
proc append base=my_combined_dataset data=want;
run;
%end;
%mend;
/*Run Loop*/
%dataset_append()
This will loop through one table at a time, subset it first based on criteria in the reference dataset, then append it to a combined dataset. Hope this helps get going in the right direction.
I guess that by "references" you mean some kind of key variable, for which you have a lookup dataset?
If that is the case, first build the list of dataset names and store them into a macro variable:
data _null_;
length datasets $32767; * max length;
date = &start_date.;
do while (date le &end_date.);
datasets = catx(" ",datasets,"name_" !! put(date,yymmn6.));
date = intnx('month',date,1,'s');
end;
call symputx('datasets',datasets);
run;
Then, combine and subset in one step:
data want;
set &datasets.;
if _n_ = 1
then do;
declare hash l (dataset:"lookup");
l.definekey("key");
l.definedone();
end;
if l.check() = 0;
run;
(assuming that your "reference" variable is called "key")
Thanks Kurt, I'm just trying to get your solution to work. I've not used date prompts in a while, it's throwing an error because when I set the Prompt up as a Day, it passes the value "31Jan2015" to the code. I'm presuming that's not in the right format though. How does that date need to be setup?
I changed &start_date. to "&start_date"d and the same for &end_date and that seems to have worked.
However it looks like the date offset I described isn't being covered - 202001 is April, 202002 is May etc:
I have tables up to April 2020, which is 202001, but if I put the end date as 01 April 2020, I get errors:
name_20202 does not exist, name_202003 does not exist. From the April 2020 date, nothing past 202001 should be requested.
I've offset my dates by a couple of months to get the right period while that problem gets sorted and it looks like I'm getting the right data, certainly the volume looks correct anyway.
So thanks so much for getting me this far, it's a huge help as I was using a legacy system to run some old SQL to create a .csv file and then importing it into SAS, this will let me automate the process and strip a load of time out.
Try this:
data _null_;
length datasets $32767; * max length;
date = intnx('month',"&start_date."d,-3,'s');;
do while (date le intnx('month',"&end_date."d,-3,'s'));
datasets = catx(" ",datasets,"name_" !! put(date,yymmn6.));
date = intnx('month',date,1,'s');
end;
call symputx('datasets',datasets);
run;
@paulrockliffe wrote:
I changed &start_date. to "&start_date"d and the same for &end_date and that seems to have worked.
However it looks like the date offset I described isn't being covered - 202001 is April, 202002 is May etc:
I have tables up to April 2020, which is 202001, but if I put the end date as 01 April 2020, I get errors:
name_20202 does not exist, name_202003 does not exist. From the April 2020 date, nothing past 202001 should be requested.
I've offset my dates by a couple of months to get the right period while that problem gets sorted and it looks like I'm getting the right data, certainly the volume looks correct anyway.
So thanks so much for getting me this far, it's a huge help as I was using a legacy system to run some old SQL to create a .csv file and then importing it into SAS, this will let me automate the process and strip a load of time out.
You need to very explicitly describe exactly how 202001 is April. In most "date" systems that is going to be January. If your "year" starts in a different month then you need to tell us that. Every SAS date manipulation tool will need to be told what the offset really really is.
Unless each and every table used in SAS follows the standard 01 = April, a more common naming convention is to use the suffix FY when applicable.
NAME_FY202001 rather than NAME_202001
If you want to divide and conquer, you need to first generate a working program for one month/one file. Then you can generalize it to all your months. Once you have it built, follow the steps in the tutorial below to convert your program to a macro. Then you can execute it for every month. Usually it's more efficient to process all at once, but if you don't have enough memory or some other reason, you can use a macro to run it separately and combine the results at the end.
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
> I have another table that contains a small list of references that is just the subset that I want
How small is small? How many references do you have?
If the list is small enough that it can fit in a where clause, that's your best bet. If not then a hash table.
%macro loop; %do i=1 %to 60; T&i %end; %mend;
data %loop;
do I=1 to 1e6;
output;
end;
run;
data LOOKUP;
do i=1 to 1e6 by 1e3;
output;
end;
run;
%* where clause - macro variable - 3 seconds;
proc sql noprint; select I into :list separated by ',' from LOOKUP;
data _null_;
set T1-T60;
where I in ( &list );
run;
%* where clause - dynamic - 3.5 seconds;
%macro loop;
%local dsid rc;
%let dsid=%sysfunc(open(LOOKUP));
%let rc=%sysfunc(fetch(&dsid));
%do %while(&rc=0);
%sysfunc(getvarn(&dsid,%sysfunc(varnum(&dsid,I))))
%let rc=%sysfunc(fetch(&dsid));
%end;
%let rc=%sysfunc(close(&dsid));
%mend;
data _null_;
set T1-T60;
where I in (
%loop
);
run;
%* hash table - 9 seconds;
data _null_;
set T1-T60;
if _N_=1 then do;
dcl hash H(dataset:'LOOKUP');
H.definekey('I');
H.definedone();
end;
if H.check()=0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.