Hi all,
I've got a macro which loops through several datasets, filters the data based on a random selection of customers (previously generated) and removes all the columns of data I'm not interested in. This macro wasn't created by me, but the SQL code within was.
%macro loop(start_month=201604, stop_month=201605);
%local month;
%do month=&start_month %to &stop_month;
%put Month: &month;
PROC SQL;
Create table SampleData_&month as
SELECT (Bits that I want to keep)
FROM (Dataset)_&month AS V1
RIGHT JOIN Rand_Cust AS R
ON R.AcctNum=V1.AcctNum;
QUIT;
%* SPECIAL CASE WHEN WE REACH END OF A YEAR;
%if %substr(&month, 5, 2) = 12 %then %let month = %eval(&month + 88);
%end;
%mend loop;
In the macro, Rand_Cust is the list of customer account numbers I generated in a previous step.
The datasets that I'm pulling my data from follow the naming convention TABLENAME_YYYYMM, where TABLENAME is identical for all the sets and YYYYMM increments as you'd expect.
Once all this data has been pulled, I'd like to append all the generated tables together, so that I can then sort by the account number (labelled AcctNum in the tables) and do some analysis. Whilst I could solve this issue with a number of APPEND steps, I'd like to know if there's a way I can either append during the loop, or if there's a command that can append all of the tables generated in a single step, as this would save me from having to use a long list of appends.
The tables I'm taking my data from are monthly tables starting from 201606 and going up to 201805.
Thanks for any help that can be offered.
Yes, this seems to be default setup in some industries to have same data split up by years and months. What it effectively means is more storage overal, much more complicated messy programming and more macro code.
What I will say, as always, is put all your data together, then process the data. It is far simpler, you don't need to do macro, or do loops, or append data etc.
So, step 1, put all month data together with a new variable of month:
data month_data; set tablename_: indsname=tmp;
/* Create an actual date so further processing is easier */
month=input(cats(tranwrd(tmp,"tablename_",""),"01"),yymmdd8.);
format month date9.; run;
This is just based on what you posted here. Now you can join rand_cust onto this data (using the sql you presented).
What we have then is one dataset, that can be filtered very simply and efficiently by using the month variable, and by using date functions like intck, rather than trying to work out if its the end of the year etc. So for example to get data from 201604 to 201605, you could simply do:
data want; set month_data;
where input(cats("201605","01"),yymmdd8.) <= month <= input(cats("201606","01"),yymmdd8.);
run;
No need for many datasets, no need to append, mo messy macro code which will fall over every other run, just simple Base SAS programming on data.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.