Hi all,
I am working with an input library that contains datasets with a shared naming pattern with a date, for example:
dataset_201802
dataset_201808
dataset_201902
dataset_201908
I need to run the same sql step through all the tables, creating variables using the date in the name of the dataset to name the variables as well. For example:
proc sql;
create table test_201808 as
select *,
pd_lt/pd_pit as m_201808
from dataset_201808;
quit;
I am looking to build a macro in order to New tables will be loaded into the library periodically, corresponding to new dates of information, with the same name pattern. Therefore, I know the first date to initialize my macro but I don't know how to establish the end.
Thank you so much for your time
Naming variables for a data set or year or similar for most purposes is a poor idea. For one thing code that works then needs to be adjusted every time a new period or source is added. It would likely be better to add a variable to a data set that indicates which is the source. Often By group processing based on that source variable will accomplish most things that would involve additional variables with the advantage of not having to keep adding variable names to deal with.
This can be accomplished with code such as:
data long ; set dataset_: indsname=dsn; source=dsn;
/* add any calculations done on observations here*/ run;
The colon is a list generator that means "use all data set names that start with dataset_". If you have a library these are in then include that in the name. If multiple libraries than one libname.setname: per library.
The variable Source could be parsed for other text if needed using any of the string functions.
Warning: if you have variables of the same name with different characteristics this may not work because of different variable types and lengths of variables could result in truncation. These might also indicate problems with your code to automate anything across multiple sets.
If the data step builds a VIEW then it will automatically bring in all the data sets needed when used though if the sets are large it may be better to build this set once and then append to a permanent set as needed.
If the pattern of names is followed faithfully (none of them are skipped) then a simple loop that stops when the next one is not found should work.
%macro run_all(start,end);
%local index next;
proc sql;
%do index=0 %to 100 %by 6;
%let next=%sysfunc(intnx(month,%sysfunc(inputn(&start,yymmn6.)),&index),yymmn6.);
%if %length(&end) and (&next > &end) %then %goto leave;
%if not %sysfunc(exist(dataset_&next)) %then %goto leave;
create table test_&next as
select *
, pd_lt/pd_pit as m_&next.
from dataset_&next
;
%end;
%leave:
quit;
%mend;
* run a fixed set of datasets ;
%run_all(201802,201902)
* run all from a beginning month ;
%run_all(201802)
Hopefully your real code is something different as the posted example does not make much sense.
Why are you creating mutliple dataset instead of one?
Why are you creating variables with a pseudo date as part of the NAME.
Make one dataset and keep the DATE as the value of a variable in the dataset instead of the NAME of a variable in the dataset.
Perhaps something like this:
proc sql noprint;
select catx('.',libname,memname)
into :memlist separated by ' '
from dictionary.members
where libname = 'WORK'
and memtype = 'DATA'
and memname like 'DATASET^_%' escape '^'
and scan(memname,-1,'_') >= "&start"
order by 1
;
quit;
data want ;
set &memlist indsname=indsname;
date = input(scan(indsname,-1,'_'),yymmn6.);
format date yymm7.;
pd_ratio = pd_lt/pd_pit ;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.