Hello everyone,
I work as a Data Manager and need to create a dynamic SAS code that can generate multiple macro variables containing string values from a metadata table.
The structure of my SAS (metadata) table is as follows:
Tablename varname start_year end_year
AB abc 2010 2015
AB def 2010 2015
AB ghi 2012 2014
CD jkl 2010 2015
CD mno 2010 2015
CD pqx 2010 2012
I would like to create a macro that contains all variables that fall within the different time intervals specified in start_year
and end_year
for each unique value in the Tablename
column. Something like:
Varlist_AB_2010 = abc def
Varlist_AB_2011 = abc def
Varlist_AB_2012 = abc def ghi
Varlist_AB_2013 = abc def ghi
Varlist_AB_2014 = abc def ghi
Varlist_AB_2015 = abc def
Varlist_CD_2010 = jkl mno pqx
Varlist_CD_2011 = jkl mno pqx
Varlist_CD_2012 = jkl mno pqx
Varlist_CD_2013 = jkl mno
Varlist_CD_2014 = jkl mno
Varlist_CD_2015 = jkl mno
Hope you can help !
Many people on this board can do this. But the harder question is Why? What will the bottom form of the data do that the top form does not let you do? Why are macro variables needed at all?
Here's a sample of how a program could begin:
data all_values;
set have;
do year = start_year to end_year;
output;
end;
run;
proc sort data=all_values;
by tablename year;
run;
What happens next is up to you, but why is this not enough as is? Where will you go with this if you process it further?
Many people on this board can do this. But the harder question is Why? What will the bottom form of the data do that the top form does not let you do? Why are macro variables needed at all?
Here's a sample of how a program could begin:
data all_values;
set have;
do year = start_year to end_year;
output;
end;
run;
proc sort data=all_values;
by tablename year;
run;
What happens next is up to you, but why is this not enough as is? Where will you go with this if you process it further?
Here's one way to continue from my earlier post above. Once you have created the data set ALL_VALUES, you could use it to generate all your PROC SQL calls:
data _null_;
set all_tables;
by table_name year;
if first.year then
call execute ('proc sql; create table ' || strip(table_name) || put(year, 4.)) || ' as select ' || varname);
else call execute(',' || varname);
if last.year then do;
call execute ('from raw.' || strip(table_name) || put(year, 4.) || '; quit;');
stop;
end;
run;
I didn't find a way to test it, so it might need a little debugging. On the plus side, it should get exactly what you asked for. And it produces just the first table, so if you want to get all the tables you have to shorten the program by removing the STOP statement.
Why PROC SQL?
Why not just combine the data using normal SAS code? That should not care about some variables not existing in some datasets.
data all;
set AB2010 - AB2015 ;
run;
@Frank_johannes wrote:
. Sorry, 'Variable' was a typo. I mean, of course, all the values from the column 'Varname.'
Let me give you a specific example. The table 'AB' exists in 6 differentiere versions in my data warehouse:
AB2010
AB2011
AB2012
AB2013
AB2014
AB2015
Not all variables from the 'Varname' column from my ‘meta tabel’ are present in all 6 files. 'Varname' exists only between the start and end periods.
I would like to create a PROC SQL with a dynamic SELECT statement that automatically loads tables AB2010-AB2015 (also CD afterwards) with the variables present in the table from the iteration.
Is it better explained now?"
Not clear at all.
I do not know what you mean by "loads tables with variables"?
Do you mean you want to read only from datasets that actually have one of the variables of interest?
In that case wouldn't it be better to just have the list of all of the variables for all of the datasets? Basically the output of PROC CONTENTS for the whole library? Then you could just query that to get the list of dataset to read.
proc sql noprint;
select catx('.',libname,memname) into :abc_dslist separated by ' '
from contents
where lowcase(name) = 'abc'
;
data abc ;
set &abc_dslist;
run;
I believe you don't need this metadata table because all this information is also available in dictionary.columns
I also believe that it would be beneficial for downstream processing if you create a collection table that has a stable structure even it it means that some columns might be missing for some years. Creating a stable table structure will avoid issues with downstream code falling over if there is for example a drop statement for a variable that doesn't exist in the sub-set of yearly tables of interest.
In my experience analysts/reporters normally prefer to get the most current version of column attributes (especially Labels and Formats) which is another reason to use all the source table to create the table structure.
It's normally also a good idea to add a variable to such a collection table that provides the information where the data got sourced from.
Below macro will hopefully give you some flexibility to create what you want.
1. Create sample data for testing
/* create sample data */
data
AB2009
AB2010
AB2011(drop=weight height)
AB2013
AB2014
AB2015
AB_All
ABX2023
AB1999AB2023
AB
AB11111
;
set sashelp.class;
run;
data ab2012;
set sashelp.class(rename=(sex=_sex));
attrib SEX length=$3 format=$3. informat=$char3. label="Person's gender";
if _sex='F' then SEX='Female';
else if _sex='M' then SEX='Male';
drop _sex;
run;
data ab2023;
set sashelp.class(drop=height age);
label new_var="That's a New Var";
new_var=1;
run;
2. Macro to create table or view over selected years
%macro createMultiYearTbl(
table_root_name =
,source_lib =
,start_year =
,stop_year =
,target_tbl =
,inds_var = source_table_name
,keep_vars =
,only_vars_from_selected_year = No
,createViewOrTable = Table
);
%if %nrbquote(&start_year)=%nrbquote() %then %let start_year=-5999;
%if %nrbquote(&stop_year)=%nrbquote() %then %let stop_year = 5999;
/* Populate &attrib_list with attrib statement from columns in any table that matches &table_root_name */
%local attrib_list mem_list;
proc sql;
create table work.__allYearsColAttr as
/* create list of column attributes from all tables that match the selection criteria */
select
libname
,memname
,upcase(name) as _upcaseName
,name
,type
,length
,format
,informat
,label
,varnum
,input(scan(memname,-1,,'kd'),year4.) as year
from dictionary.columns
where
libname =%upcase("&source_lib")
and substr(memname,1,max(1,length(memname)-4))=%upcase("&table_root_name")
and not missing(input(scan(memname,-1,,'kd'),year4.))
%if %upcase(&only_vars_from_selected_year) = Y or %upcase(&only_vars_from_selected_year) = YES %then
%do;
and input(scan(memname,-1,,'kd'),year4.) between &start_year and &stop_year
%end;
;
/* populate macro var &source_tbl_list with all selected tables in year range */
select distinct catx('.',libname,memname) into :mem_list separated by ' '
from work.__allYearsColAttr
where year between &start_year and &stop_year
;
/* populate macro var &attrib_list */
select
catx(' '
,name
,ifc(type='char',cats('length=$',length),cats('length=',length))
,ifc(not missing(format),cats('format=',format),' ')
,ifc(not missing(informat),cats('informat=',informat),' ')
,ifc(not missing(label),
ifc(findc(label,'"'),
cats("label='",label,"'"),
cats('label="',label,'"')
),
' '
)
)
into :attrib_list separated by ' '
from
(
/* per column keep attributes from most current occurrence but max length from all occurrences (to avoid string truncation issues) */
select
_upcaseName
,max(length) as length
,name
,type
,format
,informat
,label
,varnum
,year
from work.__allYearsColAttr
group by _upcaseName
having max(year)=year
)
order by year desc, varnum
;
quit;
/* create &target_tbl data */
/* option varlenchk=nowarn to avoid truncation warnings */
/* - the generated attrib statement ensures that no actual string truncation occurs */
%local sv_varlenchk;
%let sv_varlenchk=%sysfunc(getoption(varlenchk,keyword));
options varlenchk=nowarn;
/* housekeeping: clean-up */
proc datasets lib=%scan(work.&target_tbl,-2,.) mt=(data view) nolist nowarn;
delete %scan(&target_tbl,1) __allYearsColAttr;
run;
quit;
/* data step creating target table */
%if %upcase(&createViewOrTable)=VIEW %then
%do;
data &target_tbl /view=&target_tbl;
%end;
%else
%do;
data &target_tbl;
%end;
attrib &attrib_list;
/* avoid uninitialized note */
if 0 then call missing(of _all_);
%if %nrbquote(&inds_var) ne %nrbquote() %then
%do;
length _&inds_var &inds_var $41;
set &mem_list indsname=_&inds_var;
&inds_var=_&inds_var;
%end;
%else
%do;
set &mem_list;
%end;
%if %nrbquote(&keep_vars) ne %nrbquote() %then
%do;
keep &keep_vars;
%end;
run;
/* housekeeping: reset option varlenckh to initial value */
options &sv_varlenchk;
%mend;
3. Macro call and result
%createMultiYearTbl(
table_root_name =ab
,source_lib =WORK
,start_year =2010
,stop_year =2015
,target_tbl =want
/* ,keep_vars =name height */
/* ,only_vars_from_selected_year =N*/
);
proc contents data=want order=varnum;
run;quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.