Issues looping over a series of macro variables (using call symput and _N_)

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Issues looping over a series of macro variables (using call symput and _N_)

[ Edited ]

Hello all. I'm working with a large dataset on vaccines currently and am having trouble looping over an ordered set of macro variables. I'm using SAS 9.4.

 

Goal: Create dummy coded variables for 85 vaccines that are indicated across 8 variables (vax1-vax8). Perhaps someone may have have an easier way.

 

I used the following code to obtain the list of vaccine names:

 

 

%MACRO vaxNames(data=, fileOut=);		*MACRO that delivers a list of all vaccine names as variable NAME in a given dataset;
PROC FREQ DATA=&data NOPRINT;			*this code also drops ()_- and adds a _ when first character is numeric;
	TABLE vax1 / OUT=&fileOut;
	WHERE LENGTH(vax1)>1;
RUN;	
DATA &fileOut;
	SET &fileOut;
	vax=vax1;
	NAME=vax1;
	NAME=COMPRESS(NAME,"()_-");
	IF VERIFY(SUBSTR(NAME,1,1),'0123456789')=0 THEN NAME="_"||NAME;
	KEEP NAME vax;
RUN;
%MEND vaxNames;

The NAME variable is the vaccine codeword modified to be a variable name, while vax is the raw text that matches exactly to the text input for vax1-vax8.

 

To create the dummy codded variables, what I want to do is as follows:

 

1) Create 85 new variable that are named as the NAME variable in the previous dataset.

2) For each single vaccine, loop over vax1-vax8 to see if that variable = vax from the previous dataset.

3) Loop this 85 times to cover all possible vaccines

 

My method so far has been the following. I use this code to create a series of macro variables that match NAME (vaxvar1-vaxvar85), vax (vaxtxt1-vaxtxt85), and eof (# of vaccines):

 

DATA _NULL_;							*creates macro for vaccine reference (v1-v85) and end of file # (eof) based on output of vaxNames MACRO;
	SET vaers.vaxnames NOBS=nobs;
	CALL SYMPUT("eof",left(put(nobs,best.)));		*vaxvar1-vaxvar85 is the VARIABLE vaxine name";
	CALL SYMPUT('vaxvar'||LEFT(_N_),NAME);			*vaxtxt1-vaxtxt85 is the VAX name to verify against vax1-vax8;
	CALL SYMPUT('vaxtxt'||LEFT(_N_),vax);
RUN;

 

 

Then, trying to create my dummy coded dataset, I've started with a simple scenario by just doing 1 single iteration:

 

DATA testdataDummy;
	SET testdata;
	IF vax1="&vaxtxt1." THEN &vaxvar1=1;
RUN;

This works well and produces a new variable with the name according to vaxvar and a 1 if vax1=vaxtxt1. Perfect.

 

 

Then, trying to expand this to a loop is where it all falls apart:

DATA testdataDummy;
	SET testdata;
	DO i=1 TO &eof;
		IF vax1="&vaxtxt&i." THEN &vaxvar&i=1;
	END;
RUN;

The &eof resolves fine and the loop runs in that regard. The way I reference vaxtxt&i also works. However, &vaxvar&i does not work and gives me the following error: 

133  DATA testdataDummy;
134      SET testdata;
135      DO i=1 TO &eof;
136          IF vax1="&vaxtxt&i." THEN &vaxvar&i=1;
                                       -
                                       180
WARNING: Apparent symbolic reference VAXTXT not resolved.
WARNING: Apparent symbolic reference I not resolved.
WARNING: Apparent symbolic reference VAXVAR not resolved.
WARNING: Apparent symbolic reference I not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.

I'm still a bit new to macros, so perhaps I'm referencing it wrong in my loop. Any help would be so greatly appreciated. Perhaps there is a more efficient way to get at what I'm trying to do here.

 

Thank you!

 

 

 

 


Accepted Solutions
Solution
‎07-31-2017 07:45 AM
Super User
Super User
Posts: 7,042

Re: Issues looping over a series of macro variables (using call symput and _N_)

Posted in reply to mloiacono

You are confusing macro coding, which can be used to generate code, with data step coding, which can be used to generate data.

 

You are not using the right syntax to reference the elements in your "macro array" (as some like to call these large sets of macro variables with a common base name and a numeric suffix). &vaxvar&i will look for a macro variable named VAXVAR, but your macro variables are named VAXVAR1, etc. So you need to use &&vaxvar&i.

 

The easiest fix is to convert your DO loop to a %DO loop. (Note it will then need to be inside a macro definition since you cannot use %DO in open code).

DATA testdataDummy;
  SET testdata;
%DO i=1 %TO &eof;
    &&vaxvar&i =  vax1="&&vaxtxt&i.";
%END;
RUN;

So if you have 85 values then the %DO loop will generate 85 assignment statements to set your 85 boolean variables.

 

But you can skip the macro variables (and hence the %DO loop and the need to wrap the code in a macro definition) if you just use your data step to write the code instead of creating the macro variables.

filename dummy code;
DATA _NULL_;
  SET vaers.vaxnames end=eof ;
  file dummy;
  put name '= vax1=' vax :$quote. ';' ;
  if eof then call symputx('ndummy',_n_);
run;
DATA testdataDummy;
  SET testdata;
  %include dummy / source2 ;
RUN;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,913

Re: Issues looping over a series of macro variables (using call symput and _N_)

Posted in reply to mloiacono

Regarding the creation of DUMMY variables ... depending on what you are doing, some SAS procedures (like GLM, GLIMMIX, PLS, etc...) will create the DUMMY variables for you, behind the scenes, so you don't have to create the DUMMY variables yourself. So ... what PROC are you going to be using these DUMMY variables in?

 

And if your analysis will not be in one of those PROCs, then you can use PROC GLMMOD to create the DUMMY variables (and there was a recent thread here about "more modern" ways to create DUMMY variables).

 

Finally, regarding

 

IF vax1="&vaxtxt&i." THEN &vaxvar&i=1;

right now I'm thinking this isn't necessary,that the creation of DUMMY variables shouldn't be done in a loop or with macros, as explained above, nevertheless I think what will work for you here is

 

IF vax1="&&vaxtxt&i" THEN &&vaxvar&i=1;

but really, don't do it this way, use the built in SAS methods to create your DUMMY variables. 

 

Respected Advisor
Posts: 4,173

Re: Issues looping over a series of macro variables (using call symput and _N_)

[ Edited ]
Posted in reply to mloiacono

@mloiacono

If I understand your requirement then below code should give you what you're after.

I would use some prefix for the flag variables you want to create as this will make it much easier to reference them. I've chosen TT_ as prefix in below code.

/** create sample data **/
data have(drop=_:);

  length row_id 8;

  dcl hash h1();
  h1.defineKey('_item_no');
  h1.defineData('_vax');
  h1.defineDone();

  do _vax='aaax','1bb','ccc',' ','dd9d',' ';
    _item_no+1;
    h1.ref(key: _item_no ,data:_vax);
  end;

  array vax_ {8} $20;
  do row_id=1 to 10;
    do _j=1 to dim(vax_);
      _rc=h1.find(key: ceil(ranuni(1)*h1.num_items) );
      vax_[_j]=_vax;
    end;
    output;
  end;

  stop;
run;

/** 
  generate code
  - generate attrib statements for desired new variable names
  - write these attrib statements to temporary file
  - %include this temporary file in next data step which then creates the variables
**/
filename codegen temp;
data _null_;
  
  /* 
    declare and populate hash
    - hash will contain 1 entry per vaccine name
  */
  dcl hash h1();
  h1.defineKey('_vax');
  h1.defineData('_vax');
  h1.defineDone();
  dcl hiter hh1('h1');

  do until(done);
    set have(keep=vax_:) end=done;
    array vax_ {8} vax_:;
    do _j=1 to dim(vax_);
      if missing(vax_[_j]) then continue;
      _vax=vax_[_j];
      _rc=h1.ref();
    end;
  end;

  /* 
    create attrib statement and write to temporary file
    - Iterate through the hash object
    - write attrib statement for new variable per distinct vaccine name
  */
      
/*  file print;*/
  file codegen;
   _rc = hh1.first();
  do while (_rc = 0);
    _vax=substrn(_vax,1,29);
    put "attrib TT_" _vax "length=8 label='" _vax +(-1) "';";
    _rc = hh1.next();
  end;
  
  stop;
run;

/**
  create data want
  - %include temporary file with attrib statements
  - populate new flag variables if variable name matches with vaccine name
**/
data want(drop=_:);
  set have end=last;
  %include codegen /source2;
  array source_vax {*} vax:;
  array T_vax {*} TT_: ;

  if _n_=1 then
    do;
      /*
        declare and populate hash
        - each element contains the array element number for the array T_vax
        - allows to faster find the array element to be populated with a '1'
        - faster than having to always loop over the hash in order to find the
          matchin variable name
      */
      dcl hash h1();
      h1.defineKey('_el_name');
      h1.defineData('_el_num');
      h1.defineDone();
      
      do _el_num=1 to dim(T_vax);
        _el_name=substrn(vname(T_vax[_el_num]),4);
        h1.add();
      end;

    end;

  /* populate new variables */
  do _i=1 to dim(source_vax);
    if h1.find(key:substrn(source_vax[_i],1,29))=0 then
      do;
        T_vax[_el_num]=1;
      end;
  end;

run;

 Capture.JPG

 

 

Solution
‎07-31-2017 07:45 AM
Super User
Super User
Posts: 7,042

Re: Issues looping over a series of macro variables (using call symput and _N_)

Posted in reply to mloiacono

You are confusing macro coding, which can be used to generate code, with data step coding, which can be used to generate data.

 

You are not using the right syntax to reference the elements in your "macro array" (as some like to call these large sets of macro variables with a common base name and a numeric suffix). &vaxvar&i will look for a macro variable named VAXVAR, but your macro variables are named VAXVAR1, etc. So you need to use &&vaxvar&i.

 

The easiest fix is to convert your DO loop to a %DO loop. (Note it will then need to be inside a macro definition since you cannot use %DO in open code).

DATA testdataDummy;
  SET testdata;
%DO i=1 %TO &eof;
    &&vaxvar&i =  vax1="&&vaxtxt&i.";
%END;
RUN;

So if you have 85 values then the %DO loop will generate 85 assignment statements to set your 85 boolean variables.

 

But you can skip the macro variables (and hence the %DO loop and the need to wrap the code in a macro definition) if you just use your data step to write the code instead of creating the macro variables.

filename dummy code;
DATA _NULL_;
  SET vaers.vaxnames end=eof ;
  file dummy;
  put name '= vax1=' vax :$quote. ';' ;
  if eof then call symputx('ndummy',_n_);
run;
DATA testdataDummy;
  SET testdata;
  %include dummy / source2 ;
RUN;
Trusted Advisor
Posts: 1,913

Re: Issues looping over a series of macro variables (using call symput and _N_)

[ Edited ]

All of the above is probably correct, however ... as I said earlier ...

 

There's no need to compute dummy variables via either macros or data step code, as SAS has a number of built in methods to do this which are orders of magnitude more simple.

New Contributor
Posts: 2

Re: Issues looping over a series of macro variables (using call symput and _N_)

Posted in reply to mloiacono

Thank you@PaigeMiller@Tom@Patrick for the assitance! I've decided to just put my datastep into a macro and my originial solution works well now. However, all of your different solutions have been incredibly informative and have shown me new methods to learn more about and practice. All of your help has been greatly appreciated.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 254 views
  • 0 likes
  • 4 in conversation