BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Frank_johannes
Calcite | Level 5

Hello everyone,

 

I work as a Data Manager and need to create a dynamic SAS code that can generate multiple macro variables containing string values from a metadata table.

 

The structure of my SAS (metadata) table is as follows:

 

Tablename varname start_year end_year
AB         abc      2010      2015
AB         def      2010      2015
AB         ghi      2012      2014
CD         jkl      2010      2015
CD         mno      2010      2015
CD         pqx      2010      2012

 

I would like to create a macro that contains all variables that fall within the different time intervals specified in start_year and end_year for each unique value in the Tablename column. Something like:

 

Varlist_AB_2010 = abc def
Varlist_AB_2011 = abc def
Varlist_AB_2012 = abc def ghi
Varlist_AB_2013 = abc def ghi
Varlist_AB_2014 = abc def ghi
Varlist_AB_2015 = abc def
Varlist_CD_2010 = jkl mno pqx
Varlist_CD_2011 = jkl mno pqx
Varlist_CD_2012 = jkl mno pqx
Varlist_CD_2013 = jkl mno
Varlist_CD_2014 = jkl mno
Varlist_CD_2015 = jkl mno
Hope you can help ! 
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Many people on this board can do this.  But the harder question is Why?  What will the bottom form of the data do that the top form does not let you do?  Why are macro variables needed at all?

 

Here's a sample of how a program could begin:

 

data all_values;
   set have;
   do year = start_year to end_year;
      output;
   end;
run;
proc sort data=all_values;
   by tablename year;
run;

What happens next is up to you, but why is this not enough as is?  Where will you go with this if you process it further?

View solution in original post

8 REPLIES 8
Astounding
PROC Star

Many people on this board can do this.  But the harder question is Why?  What will the bottom form of the data do that the top form does not let you do?  Why are macro variables needed at all?

 

Here's a sample of how a program could begin:

 

data all_values;
   set have;
   do year = start_year to end_year;
      output;
   end;
run;
proc sort data=all_values;
   by tablename year;
run;

What happens next is up to you, but why is this not enough as is?  Where will you go with this if you process it further?

Frank_johannes
Calcite | Level 5
I need the lists to be stored in a macro because I want to loop over a set of datafiles with prefix year for example
AB2000
AB2001
Etc.

My problem is that not all datasets contains the same variables

%macro ab(start, end);
%do year = &start %to &end
Proc sql
Create table AB&year. as
Select &varlist_AB_&year
From raw.AB&year
;
Quit;
%mend
Astounding
PROC Star

Here's one way to continue from my earlier post above.  Once you have created the data set ALL_VALUES, you could use it to generate all your PROC SQL calls:

data _null_;

   set all_tables;
   by table_name year;
   if first.year then
        call execute ('proc sql;  create table ' || strip(table_name) || put(year, 4.)) || ' as select '  || varname); 
   else call execute(',' || varname);
   
if last.year then do;
call execute ('from raw.' || strip(table_name) || put(year, 4.) || '; quit;'); stop; end;
run;

I didn't find a way to test it, so it might need a little debugging.  On the plus side, it should get exactly what you asked for.  And it produces just the first table, so if you want to get all the tables you have to shorten the program by removing the STOP statement.

Cynthia_sas
SAS Super FREQ
Hi, I'm not sure why you need a macro program or macro definition to do this. You can easily do what you want with array processing.
And I'm not sure I understand what you mean when you say that you want a "macro" that contains all variables. Do you mean you want one Macro variable that contains ALL the rows you listed as one macro variable value? Or do you mean you want one macro variable for each year so that a reference like Varlist_&tb._&yr would return the varname value that you want?
Cynthia
Frank_johannes
Calcite | Level 5
. Sorry, 'Variable' was a typo. I mean, of course, all the values from the column 'Varname.'

Let me give you a specific example. The table 'AB' exists in 6 differentiere versions in my data warehouse:

AB2010
AB2011
AB2012
AB2013
AB2014
AB2015

Not all variables from the 'Varname' column from my ‘meta tabel’ are present in all 6 files. 'Varname' exists only between the start and end periods.

I would like to create a PROC SQL with a dynamic SELECT statement that automatically loads tables AB2010-AB2015 (also CD afterwards) with the variables present in the table from the iteration.

Is it better explained now?"
Tom
Super User Tom
Super User

Why PROC SQL?

Why not just combine the data using normal SAS code?  That should not care about some variables not existing in some datasets.

data all;
  set AB2010 - AB2015 ;
run;
Tom
Super User Tom
Super User

@Frank_johannes wrote:
. Sorry, 'Variable' was a typo. I mean, of course, all the values from the column 'Varname.'

Let me give you a specific example. The table 'AB' exists in 6 differentiere versions in my data warehouse:

AB2010
AB2011
AB2012
AB2013
AB2014
AB2015

Not all variables from the 'Varname' column from my ‘meta tabel’ are present in all 6 files. 'Varname' exists only between the start and end periods.

I would like to create a PROC SQL with a dynamic SELECT statement that automatically loads tables AB2010-AB2015 (also CD afterwards) with the variables present in the table from the iteration.

Is it better explained now?"

Not clear at all.

I do not know what you mean by  "loads tables with variables"?

Do you mean you want to read only from datasets that actually have one of the variables of interest?

 

In that case wouldn't it be better to just have the list of all of the variables for all of the datasets?  Basically the output of PROC CONTENTS for the whole library?  Then you could just query that to get the list of dataset to read.

proc sql noprint;
select catx('.',libname,memname) into :abc_dslist separated by ' '
from contents
where lowcase(name) = 'abc'
;

data abc ;
  set &abc_dslist;
run;
Patrick
Opal | Level 21

I believe you don't need this metadata table because all this information is also available in dictionary.columns

 

I also believe that it would be beneficial for downstream processing if you create a collection table that has a stable structure even it it means that some columns might be missing for some years. Creating a stable table structure will avoid issues with downstream code falling over if there is for example a drop statement for a variable that doesn't exist in the sub-set of yearly tables of interest.

 

In my experience analysts/reporters normally prefer to get the most current version of column attributes (especially Labels and Formats) which is another reason to use all the source table to create the table structure.

 

It's normally also a good idea to add a variable to such a collection table that provides the information where the data got sourced from.

 

Below macro will hopefully give you some flexibility to create what you want.

 

1. Create sample data for testing

Spoiler
/* create sample data */
data 
  AB2009
  AB2010
  AB2011(drop=weight height)
  AB2013
  AB2014
  AB2015
  AB_All
  ABX2023
  AB1999AB2023
  AB
  AB11111
  ;
  set sashelp.class;
run;
data ab2012;
  set sashelp.class(rename=(sex=_sex));
  attrib SEX length=$3 format=$3. informat=$char3. label="Person's gender";
  if _sex='F' then SEX='Female';
  else if _sex='M' then SEX='Male';
  drop _sex;
run;
data ab2023;
  set sashelp.class(drop=height age);
  label new_var="That's a New Var";
  new_var=1;
run;

2. Macro to create table or view over selected years

%macro createMultiYearTbl(
  table_root_name = 
  ,source_lib     = 
  ,start_year     = 
  ,stop_year      = 
  ,target_tbl     = 
  ,inds_var       = source_table_name
  ,keep_vars      =
  ,only_vars_from_selected_year = No
  ,createViewOrTable            = Table
  );

  %if %nrbquote(&start_year)=%nrbquote() %then %let start_year=-5999;
  %if %nrbquote(&stop_year)=%nrbquote()  %then %let stop_year = 5999;

  /* Populate &attrib_list with attrib statement from columns in any table that matches &table_root_name */
  %local attrib_list mem_list;
  proc sql;
    create table work.__allYearsColAttr as
      /* create list of column attributes from all tables that match the selection criteria */
      select
         libname
        ,memname
        ,upcase(name) as _upcaseName
        ,name
        ,type
        ,length
        ,format
        ,informat
        ,label
        ,varnum
        ,input(scan(memname,-1,,'kd'),year4.) as year
      from dictionary.columns
      where 
        libname =%upcase("&source_lib") 
        and substr(memname,1,max(1,length(memname)-4))=%upcase("&table_root_name")
        and not missing(input(scan(memname,-1,,'kd'),year4.))
        %if %upcase(&only_vars_from_selected_year) = Y or %upcase(&only_vars_from_selected_year) = YES %then
          %do;
            and input(scan(memname,-1,,'kd'),year4.) between &start_year and &stop_year
          %end;
      ;
    
    /* populate macro var &source_tbl_list with all selected tables in year range */
    select distinct catx('.',libname,memname) into :mem_list separated by ' '
    from work.__allYearsColAttr
    where year between &start_year and &stop_year
    ;

    /* populate macro var &attrib_list */
    select 
      catx(' '
        ,name
        ,ifc(type='char',cats('length=$',length),cats('length=',length))
        ,ifc(not missing(format),cats('format=',format),' ')
        ,ifc(not missing(informat),cats('informat=',informat),' ')
        ,ifc(not missing(label),
              ifc(findc(label,'"'),
                    cats("label='",label,"'"),
                    cats('label="',label,'"')
                  ),
              ' '
            )
        )
        into :attrib_list separated by ' '
    from 
      (
        /* per column keep attributes from most current occurrence but max length from all occurrences (to avoid string truncation issues) */
        select 
          _upcaseName
          ,max(length) as length
          ,name
          ,type
          ,format
          ,informat
          ,label
          ,varnum
          ,year
        from work.__allYearsColAttr
        group by _upcaseName
        having max(year)=year
      )
      order by year desc, varnum
    ;

  quit;

  /* create &target_tbl data                                                          */
  /* option varlenchk=nowarn to avoid truncation warnings                             */
  /* - the generated attrib statement ensures that no actual string truncation occurs */
  %local sv_varlenchk;
  %let sv_varlenchk=%sysfunc(getoption(varlenchk,keyword));
  options varlenchk=nowarn;

  /* housekeeping: clean-up */
  proc datasets lib=%scan(work.&target_tbl,-2,.) mt=(data view) nolist nowarn;
    delete %scan(&target_tbl,1) __allYearsColAttr;
    run;
  quit;
  
  /* data step creating target table */
  %if %upcase(&createViewOrTable)=VIEW %then
    %do;
      data &target_tbl /view=&target_tbl;
    %end;
  %else
    %do;
      data &target_tbl;
    %end;

        attrib &attrib_list;
        /* avoid uninitialized note */
        if 0 then call missing(of _all_);

  %if %nrbquote(&inds_var) ne %nrbquote() %then
    %do;
        length _&inds_var &inds_var $41;
        set &mem_list indsname=_&inds_var;
        &inds_var=_&inds_var;
    %end;
  %else
    %do;
        set &mem_list;
    %end;

  %if %nrbquote(&keep_vars) ne %nrbquote() %then
    %do;
        keep &keep_vars;
    %end;

      run;

  /* housekeeping: reset option varlenckh to initial value */
  options &sv_varlenchk;

%mend;


3. Macro call and result

%createMultiYearTbl(
  table_root_name =ab
  ,source_lib     =WORK
  ,start_year     =2010
  ,stop_year      =2015
  ,target_tbl     =want
/*  ,keep_vars      =name height */
/*  ,only_vars_from_selected_year =N*/
  );

proc contents data=want order=varnum;
run;quit;

Patrick_0-1701588941514.png

 

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1403 views
  • 0 likes
  • 5 in conversation