Hello all. I'm working with a large dataset on vaccines currently and am having trouble looping over an ordered set of macro variables. I'm using SAS 9.4.
Goal: Create dummy coded variables for 85 vaccines that are indicated across 8 variables (vax1-vax8). Perhaps someone may have have an easier way.
I used the following code to obtain the list of vaccine names:
%MACRO vaxNames(data=, fileOut=); *MACRO that delivers a list of all vaccine names as variable NAME in a given dataset; PROC FREQ DATA=&data NOPRINT; *this code also drops ()_- and adds a _ when first character is numeric; TABLE vax1 / OUT=&fileOut; WHERE LENGTH(vax1)>1; RUN; DATA &fileOut; SET &fileOut; vax=vax1; NAME=vax1; NAME=COMPRESS(NAME,"()_-"); IF VERIFY(SUBSTR(NAME,1,1),'0123456789')=0 THEN NAME="_"||NAME; KEEP NAME vax; RUN; %MEND vaxNames;
The NAME variable is the vaccine codeword modified to be a variable name, while vax is the raw text that matches exactly to the text input for vax1-vax8.
To create the dummy codded variables, what I want to do is as follows:
1) Create 85 new variable that are named as the NAME variable in the previous dataset.
2) For each single vaccine, loop over vax1-vax8 to see if that variable = vax from the previous dataset.
3) Loop this 85 times to cover all possible vaccines
My method so far has been the following. I use this code to create a series of macro variables that match NAME (vaxvar1-vaxvar85), vax (vaxtxt1-vaxtxt85), and eof (# of vaccines):
DATA _NULL_; *creates macro for vaccine reference (v1-v85) and end of file # (eof) based on output of vaxNames MACRO; SET vaers.vaxnames NOBS=nobs; CALL SYMPUT("eof",left(put(nobs,best.))); *vaxvar1-vaxvar85 is the VARIABLE vaxine name"; CALL SYMPUT('vaxvar'||LEFT(_N_),NAME); *vaxtxt1-vaxtxt85 is the VAX name to verify against vax1-vax8; CALL SYMPUT('vaxtxt'||LEFT(_N_),vax); RUN;
Then, trying to create my dummy coded dataset, I've started with a simple scenario by just doing 1 single iteration:
DATA testdataDummy; SET testdata; IF vax1="&vaxtxt1." THEN &vaxvar1=1; RUN;
This works well and produces a new variable with the name according to vaxvar and a 1 if vax1=vaxtxt1. Perfect.
Then, trying to expand this to a loop is where it all falls apart:
DATA testdataDummy; SET testdata; DO i=1 TO &eof; IF vax1="&vaxtxt&i." THEN &vaxvar&i=1; END; RUN;
The &eof resolves fine and the loop runs in that regard. The way I reference vaxtxt&i also works. However, &vaxvar&i does not work and gives me the following error:
133 DATA testdataDummy; 134 SET testdata; 135 DO i=1 TO &eof; 136 IF vax1="&vaxtxt&i." THEN &vaxvar&i=1; - 180 WARNING: Apparent symbolic reference VAXTXT not resolved. WARNING: Apparent symbolic reference I not resolved. WARNING: Apparent symbolic reference VAXVAR not resolved. WARNING: Apparent symbolic reference I not resolved. ERROR 180-322: Statement is not valid or it is used out of proper order.
I'm still a bit new to macros, so perhaps I'm referencing it wrong in my loop. Any help would be so greatly appreciated. Perhaps there is a more efficient way to get at what I'm trying to do here.
Thank you!
You are confusing macro coding, which can be used to generate code, with data step coding, which can be used to generate data.
You are not using the right syntax to reference the elements in your "macro array" (as some like to call these large sets of macro variables with a common base name and a numeric suffix). &vaxvar&i will look for a macro variable named VAXVAR, but your macro variables are named VAXVAR1, etc. So you need to use &&vaxvar&i.
The easiest fix is to convert your DO loop to a %DO loop. (Note it will then need to be inside a macro definition since you cannot use %DO in open code).
DATA testdataDummy;
SET testdata;
%DO i=1 %TO &eof;
&&vaxvar&i = vax1="&&vaxtxt&i.";
%END;
RUN;
So if you have 85 values then the %DO loop will generate 85 assignment statements to set your 85 boolean variables.
But you can skip the macro variables (and hence the %DO loop and the need to wrap the code in a macro definition) if you just use your data step to write the code instead of creating the macro variables.
filename dummy code;
DATA _NULL_;
SET vaers.vaxnames end=eof ;
file dummy;
put name '= vax1=' vax :$quote. ';' ;
if eof then call symputx('ndummy',_n_);
run;
DATA testdataDummy;
SET testdata;
%include dummy / source2 ;
RUN;
Regarding the creation of DUMMY variables ... depending on what you are doing, some SAS procedures (like GLM, GLIMMIX, PLS, etc...) will create the DUMMY variables for you, behind the scenes, so you don't have to create the DUMMY variables yourself. So ... what PROC are you going to be using these DUMMY variables in?
And if your analysis will not be in one of those PROCs, then you can use PROC GLMMOD to create the DUMMY variables (and there was a recent thread here about "more modern" ways to create DUMMY variables).
Finally, regarding
IF vax1="&vaxtxt&i." THEN &vaxvar&i=1;
right now I'm thinking this isn't necessary,that the creation of DUMMY variables shouldn't be done in a loop or with macros, as explained above, nevertheless I think what will work for you here is
IF vax1="&&vaxtxt&i" THEN &&vaxvar&i=1;
but really, don't do it this way, use the built in SAS methods to create your DUMMY variables.
If I understand your requirement then below code should give you what you're after.
I would use some prefix for the flag variables you want to create as this will make it much easier to reference them. I've chosen TT_ as prefix in below code.
/** create sample data **/
data have(drop=_:);
length row_id 8;
dcl hash h1();
h1.defineKey('_item_no');
h1.defineData('_vax');
h1.defineDone();
do _vax='aaax','1bb','ccc',' ','dd9d',' ';
_item_no+1;
h1.ref(key: _item_no ,data:_vax);
end;
array vax_ {8} $20;
do row_id=1 to 10;
do _j=1 to dim(vax_);
_rc=h1.find(key: ceil(ranuni(1)*h1.num_items) );
vax_[_j]=_vax;
end;
output;
end;
stop;
run;
/**
generate code
- generate attrib statements for desired new variable names
- write these attrib statements to temporary file
- %include this temporary file in next data step which then creates the variables
**/
filename codegen temp;
data _null_;
/*
declare and populate hash
- hash will contain 1 entry per vaccine name
*/
dcl hash h1();
h1.defineKey('_vax');
h1.defineData('_vax');
h1.defineDone();
dcl hiter hh1('h1');
do until(done);
set have(keep=vax_:) end=done;
array vax_ {8} vax_:;
do _j=1 to dim(vax_);
if missing(vax_[_j]) then continue;
_vax=vax_[_j];
_rc=h1.ref();
end;
end;
/*
create attrib statement and write to temporary file
- Iterate through the hash object
- write attrib statement for new variable per distinct vaccine name
*/
/* file print;*/
file codegen;
_rc = hh1.first();
do while (_rc = 0);
_vax=substrn(_vax,1,29);
put "attrib TT_" _vax "length=8 label='" _vax +(-1) "';";
_rc = hh1.next();
end;
stop;
run;
/**
create data want
- %include temporary file with attrib statements
- populate new flag variables if variable name matches with vaccine name
**/
data want(drop=_:);
set have end=last;
%include codegen /source2;
array source_vax {*} vax:;
array T_vax {*} TT_: ;
if _n_=1 then
do;
/*
declare and populate hash
- each element contains the array element number for the array T_vax
- allows to faster find the array element to be populated with a '1'
- faster than having to always loop over the hash in order to find the
matchin variable name
*/
dcl hash h1();
h1.defineKey('_el_name');
h1.defineData('_el_num');
h1.defineDone();
do _el_num=1 to dim(T_vax);
_el_name=substrn(vname(T_vax[_el_num]),4);
h1.add();
end;
end;
/* populate new variables */
do _i=1 to dim(source_vax);
if h1.find(key:substrn(source_vax[_i],1,29))=0 then
do;
T_vax[_el_num]=1;
end;
end;
run;
You are confusing macro coding, which can be used to generate code, with data step coding, which can be used to generate data.
You are not using the right syntax to reference the elements in your "macro array" (as some like to call these large sets of macro variables with a common base name and a numeric suffix). &vaxvar&i will look for a macro variable named VAXVAR, but your macro variables are named VAXVAR1, etc. So you need to use &&vaxvar&i.
The easiest fix is to convert your DO loop to a %DO loop. (Note it will then need to be inside a macro definition since you cannot use %DO in open code).
DATA testdataDummy;
SET testdata;
%DO i=1 %TO &eof;
&&vaxvar&i = vax1="&&vaxtxt&i.";
%END;
RUN;
So if you have 85 values then the %DO loop will generate 85 assignment statements to set your 85 boolean variables.
But you can skip the macro variables (and hence the %DO loop and the need to wrap the code in a macro definition) if you just use your data step to write the code instead of creating the macro variables.
filename dummy code;
DATA _NULL_;
SET vaers.vaxnames end=eof ;
file dummy;
put name '= vax1=' vax :$quote. ';' ;
if eof then call symputx('ndummy',_n_);
run;
DATA testdataDummy;
SET testdata;
%include dummy / source2 ;
RUN;
All of the above is probably correct, however ... as I said earlier ...
There's no need to compute dummy variables via either macros or data step code, as SAS has a number of built in methods to do this which are orders of magnitude more simple.
Thank you@PaigeMiller@Tom@Patrick for the assitance! I've decided to just put my datastep into a macro and my originial solution works well now. However, all of your different solutions have been incredibly informative and have shown me new methods to learn more about and practice. All of your help has been greatly appreciated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.