BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
paulrockliffe
Obsidian | Level 7

I have a background system that is serving up monthly table to our analytics function.  Each table is named "name_202001", "name_202002" etc.  Where 2020 is the Year and 01 is the end date of the month number of the tax year.  So 01 is 30th of April, 02 31st of June etc.

I am using Enterprise Guide and would like a program that appends these tables based on a start and end date Parameter.

 

There's a kicker, there always is.  These tables contain a huge huge number of references, and I'm aiming to be able to combine around 60 tables into one.  I have another table that contains a small list of references that is just the subset that I want

 

So I need the program to filter as it goes, rather than combining all 60 tables and then filtering them afterwards.

 

Does that make sense?  Anyone know how it's done?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I guess that by "references" you mean some kind of key variable, for which you have a lookup dataset?

 

If that is the case, first build the list of dataset names and store them into a macro variable:

data _null_;
length datasets $32767; * max length;
date = &start_date.;
do while (date le &end_date.);
  datasets = catx(" ",datasets,"name_" !! put(date,yymmn6.));
  date = intnx('month',date,1,'s');
end;
call symputx('datasets',datasets);
run;

Then, combine and subset in one step:

data want;
set &datasets.;
if _n_ = 1
then do;
  declare hash l (dataset:"lookup");
  l.definekey("key");
  l.definedone();
end;
if l.check() = 0;
run;

(assuming that your "reference" variable is called "key")

View solution in original post

24 REPLIES 24
ballardw
Super User

Assuming that the 2020 is the year, do intend to combine across years?

If no, then you can use a data step such as:

 

data want;

   set name_202001 - name_202005;

run;

 

to append sequentially numbered names. However if you need to cross years such as name_201911 to name_202003 that approach doesn't work because the list parser SAS uses requires sequential numbering.

 

 

bobpep212
Quartz | Level 8

I think it all depends on how this referenced table is structured. Can you turn that data into macro variables that can be used to subset the data in a data step?

%let cnt=8; /*number of datasets you need to combine. Assuming 1 row per table in reference table, should equal row count of reference table*/
%macro dataset_append();
%do i=1 %to &cnt.;
/*Create variables to be used to go table by table and subset according to criteria*/
data _null_;
set reference_table (firstobs=&i. obs=&i);
call symputx('table_name',variable_with_table_name);
call symputx('criteria1', variable_with_criteria1);
call symputx('criteria2', variable_with_criteria2);
run;
/*create temp "want" table that is single month table subsetted based on criteria from reference table*/
data want;
set &table_name.;
where var1=&criteria1. and var2=&criteria2.;
run;
/*append to your combined dataset*/
proc append base=my_combined_dataset data=want;
run;
%end;
%mend;
/*Run Loop*/
%dataset_append()

This will loop through one table at a time, subset it first based on criteria in the reference dataset, then append it to a combined dataset. Hope this helps get going in the right direction.

Kurt_Bremser
Super User

I guess that by "references" you mean some kind of key variable, for which you have a lookup dataset?

 

If that is the case, first build the list of dataset names and store them into a macro variable:

data _null_;
length datasets $32767; * max length;
date = &start_date.;
do while (date le &end_date.);
  datasets = catx(" ",datasets,"name_" !! put(date,yymmn6.));
  date = intnx('month',date,1,'s');
end;
call symputx('datasets',datasets);
run;

Then, combine and subset in one step:

data want;
set &datasets.;
if _n_ = 1
then do;
  declare hash l (dataset:"lookup");
  l.definekey("key");
  l.definedone();
end;
if l.check() = 0;
run;

(assuming that your "reference" variable is called "key")

paulrockliffe
Obsidian | Level 7

Thanks Kurt, I'm just trying to get your solution to work.  I've not used date prompts in a while, it's throwing an error because when I set the Prompt up as a Day, it passes the value "31Jan2015" to the code.  I'm presuming that's not in the right format though.  How does that date need to be setup?

paulrockliffe
Obsidian | Level 7

I changed &start_date. to "&start_date"d and the same for &end_date and that seems to have worked.

 

However it looks like the date offset I described isn't being covered - 202001 is April, 202002 is May etc:

I have tables up to April 2020, which is 202001, but if I put the end date as 01 April 2020, I get errors:

 

name_20202 does not exist, name_202003 does not exist.  From the April 2020 date, nothing past 202001 should be requested.

 

I've offset my dates by a couple of months to get the right period while that problem gets sorted and it looks like I'm getting the right data, certainly the volume looks correct anyway.

 

So thanks so much for getting me this far, it's a huge help as I was using a legacy system to run some old SQL to create a .csv file and then importing it into SAS, this will let me automate the process and strip a load of time out.

Kurt_Bremser
Super User

Try this:

data _null_;
length datasets $32767; * max length;
date = intnx('month',"&start_date."d,-3,'s');;
do while (date le intnx('month',"&end_date."d,-3,'s'));
  datasets = catx(" ",datasets,"name_" !! put(date,yymmn6.));
  date = intnx('month',date,1,'s');
end;
call symputx('datasets',datasets);
run;
paulrockliffe
Obsidian | Level 7
Thanks, I'll have a look and see how it goes.....
ballardw
Super User

@paulrockliffe wrote:

I changed &start_date. to "&start_date"d and the same for &end_date and that seems to have worked.

 

However it looks like the date offset I described isn't being covered - 202001 is April, 202002 is May etc:

I have tables up to April 2020, which is 202001, but if I put the end date as 01 April 2020, I get errors:

 

name_20202 does not exist, name_202003 does not exist.  From the April 2020 date, nothing past 202001 should be requested.

 

I've offset my dates by a couple of months to get the right period while that problem gets sorted and it looks like I'm getting the right data, certainly the volume looks correct anyway.

 

So thanks so much for getting me this far, it's a huge help as I was using a legacy system to run some old SQL to create a .csv file and then importing it into SAS, this will let me automate the process and strip a load of time out.


You need to very explicitly describe exactly how 202001 is April. In most "date" systems that is going to be January. If your "year" starts in a different month then you need to tell us that. Every SAS date manipulation tool will need to be told what the offset really really is.

paulrockliffe
Obsidian | Level 7
I did put it in the first post. In the UK the Tax year is a fairly common concept, with month one being April and month 12 being March, though this data should have been set up using Calendar months as it's a total pain to work with, for no benefit at all.
ChrisNZ
Tourmaline | Level 20

Unless each and every table used in SAS follows the standard 01 = April, a more common naming convention is to use the suffix FY when applicable.

 

NAME_FY202001  rather than NAME_202001

 

paulrockliffe
Obsidian | Level 7
Yeah, this table would have been setup decades ago, I suspect it's like this because something was simpler to integrate with another system like this. It goes back to a time when very little consideration was given to people actually wanting to access the data.

Not something I have any control over and it's a legacy system, so it's not going to get changed to help me! I've started integrating the new system's data with this and it's at least been done properly!
Reeza
Super User

If you want to divide and conquer, you need to first generate a working program for one month/one file. Then you can generalize it to all your months. Once you have it built, follow the steps in the tutorial below to convert your program to a macro. Then you can execute it for every month. Usually it's more efficient to process all at once, but if you don't have enough memory or some other reason, you can use a macro to run it separately and combine the results at the end.

 

 

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

ChrisNZ
Tourmaline | Level 20

> I have another table that contains a small list of references that is just the subset that I want

How small is small? How many references do you have?

 

If the list is small enough that it can fit in a where clause, that's your best bet. If not then a hash table.

 %macro loop; %do i=1 %to 60; T&i %end; %mend; 
data %loop;
   do I=1 to 1e6;
    output; 
  end;
run;

data LOOKUP; 
   do i=1 to 1e6 by 1e3;
    output; 
  end;
run;

%* where clause - macro variable - 3 seconds;
proc sql noprint; select I into :list separated by ',' from LOOKUP;
data _null_; 
  set T1-T60; 
  where I in ( &list );
run; 

%* where clause - dynamic - 3.5 seconds;
%macro loop; 
   %local dsid rc;
   %let dsid=%sysfunc(open(LOOKUP));
   %let rc=%sysfunc(fetch(&dsid));
   %do %while(&rc=0);
     %sysfunc(getvarn(&dsid,%sysfunc(varnum(&dsid,I))))
     %let rc=%sysfunc(fetch(&dsid));
   %end;
   %let rc=%sysfunc(close(&dsid));
%mend;
data _null_; 
  set T1-T60; 
  where I in ( 
  %loop
  );
run; 

%* hash table - 9 seconds;
data _null_; 
  set T1-T60; 
  if _N_=1 then do;
    dcl hash H(dataset:'LOOKUP');
    H.definekey('I');
    H.definedone();
  end;
  if H.check()=0;
run; 

 

 

 

 

 

paulrockliffe
Obsidian | Level 7
Thanks, there's 17,000 values in my 'small' table and millions in the big table, so I think the hash table approach is the one.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 3149 views
  • 0 likes
  • 6 in conversation