I am trying to loop through variable names in a list to determine if it meets a condition. If it does then I need to revise the list of variables in a variable named grp_lst. I would like to replace the variable name in the list with blank which by the end of the loop I will have a final variable list. Here is my sample code. I am using this in a macro so I created a little test macro around the code.
Let's assume that the &actual = 2 for Age and that the value in the drug1 dataset is 'age gender type' . After this code I would expect to see 'gender type' in the grp_lst.
Variables that are needed and that are created prior to this code:
%let var_lst = age gender type ;
%let varCount = 3 ;
%let expected = 3 3 6 ;
/*expected value for each variable in the var_lst */
%macro test();
	%DO k=1 %to &varCount;
		/*get the actual count for each variable in variable list*/
		proc sql NOPRINT;
			Select count(distinct %Scan(&var_lst, &k)) as %Scan(&var_lst, &k)_lvls 
				into :actual separated by ' ' from drug1;
		quit;
		/* create revised grp_lst by removing variables in the list if actual < expected */
		/* currently data in grp_lst is equal to var_lst for all observations - just the way it was created..*/
		DATA drug1;
			set drug1;
			if &actual. < %Scan(&expected., &k.) then
				grp_lst=%sysfunc(compress(%sysfunc(tranwrd(grp_lst, "%Scan(&var_lst,&k)", 
					" "))));
		run;
	%END;
%MEND;
%test();
This is what gets resolved in the log for the grp_list statement in the datastep:
MPRINT(test 😞 if 2 < 3 then grp_lst = grp_lst ;
Use an array - remember Base SAS programming is used for data processing, Macro only creates text!!
data drug1;
  set drug1;
  array vals{*} age gender type;
  do i=1 to dim(vals);
   <do your logic here>;
  end;
run;
thank you. I tried your solution (adding an addtional type variable from original post) but am getting errors.
code:
DATA drug1 ;
SET drug1 ;
ARRAY varlst[&varCount.] &var_lst. ;
array act{&varCount.} (&par_lvl.) ;
array exp{&varCount.} (&s_lvl.) ;
%do j = 1 %to &varCount. ;
if act[j] < exp[j] then grp_lst = tranwrd(grp_lst, arr_lst[j], " " ) ;
%end ;
run ;
log:
MPRINT(temp): DATA drug1 ;
MPRINT(temp): SET drug1 ;
MPRINT(temp): ARRAY varlst[ 4] AGE GENDER TYPE1 TYPE2  ;
MPRINT(temp): array act{ 4} (3 2 2 5) ;
MPRINT(temp): array exp{ 4} (3 3 3 6 ) ;
MPRINT(temp): if act[j] < exp[j] then grp_lst = tranwrd(grp_lst, var_lst[j], " " ) ;
MPRINT(temp): if act[j] < exp[j] then grp_lst = tranwrd(grp_lst, var_lst[j], " " ) ;
MPRINT(temp): if act[j] < exp[j] then grp_lst = tranwrd(grp_lst, var_lst[j], " " ) ;
MPRINT(temp): if act[j] < exp[j] then grp_lst = tranwrd(grp_lst, var_lst[j], " " ) ;
MPRINT(temp): run ;
NOTE: Variable j is uninitialized.
ERROR: Array subscript out of range at line 697 column 12.
You're still mixing macro and non-macro code. Remove the two % (i.e., just use do and end)
Art, CEO, AnalystFinder.com
I have to use the % because my code is within a macro.
The %do does not belong in this data step.
You want to use the data step DO command.
So removing the %s helped but the tranwrd is not working. can you see what I am doing wrong with the tranwrd command? the last line is from a put statement but clearly based on the values grp_name should only be age.
MPRINT(SURROGATE): DATA drug1 ;
MPRINT(SURROGATE): SET drug1 ;
MPRINT(SURROGATE): ARRAY var_lst[ 4] AGE GENDER TYPE1 TYPE2 ;
MPRINT(SURROGATE): array act{ 4} (3 2 2 5) ;
MPRINT(SURROGATE): array exp{ 4} (3 3 3 6 );
MPRINT(SURROGATE): do j = 1 to 4 ;
MPRINT(SURROGATE): if act[j] < exp[j] then grp_lst = tranwrd(grp_lst, var_lst[j], " " ) ;
MPRINT(SURROGATE): end ;
MPRINT(SURROGATE): run ;
NOTE: There were 19540 observations read from the data set WORK.DRUG1.
NOTE: The data set WORK.DRUG1 has 19540 observations and 26 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
MPRINT(SURROGATE): data _null_;
MPRINT(SURROGATE): set drug1 (obs=1);
MPRINT(SURROGATE): call symputx('grp_name' , grp_lst);
MPRINT(SURROGATE): run;
NOTE: There were 1 observations read from the data set WORK.DRUG1.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
17 The SAS System 10:44 Monday, July 17, 2017
grp_name = age gender type1 type2
Would help if you provide an example drug1 file
Art, CEO, AnalystFinder.com
The grp_lst probably should have been done all in macro language but I couldn't figure that out so I created a column that I later put into a macro list using proc sql into; So, now that I have this I am trying to work with it . If I can edit the column using tranwrd then I will eventually create a macro list from it.
here is a brief sample. Basicall, grp_lst is the same for each drug in the database when I loop through drugs I use a new list for each drug. This sample is for just drug1 .
data drug1_sample ;
length grp_lst $32. ;
input drug $ age $ gender $ type1 $ type2 $ grp_lst $ ;
infile datalines4 delimiter=',' ;
datalines;
drug1,<50,F,A,B,age gender type1 type2
drug1,<50,F,B,B,age gender type1 type2
drug1,<50,F,C,B,age gender type1 type2
drug1,<50,M,A,B,age gender type1 type2
drug1,<50,M,B,B,age gender type1 type2
drug1,<50,M,C,B,age gender type1 type2
;
run;
proc print ;run;
actually, I don't need the %. thanks
You are mixing up macro language and DATA step code in ways that have no chance of working properly.
Make this a macro language problem until you have your new list defined. Then (if desirable) you can replace GRP_LST in the data set.
Here are the basics of how to get started. (There's a %DO loop, so this takes place inside a macro definition.)
data _null_;
set drug1 (obs=1);
call symputx('original_list' , grp_lst);
run;
proc sql;
%do k=1 %to &varCount;
%let next_name = %scan(&var_lst, &k);
select count(distinct &next_name) into : &next_name from drug1;
%end;
quit;
This gives you macro variables:
&ORIGINAL_LIST is all of the variables (a copy of the DATA step variable GRP_LST)
&VAR_LST is a list of variables to test to see if they should be removed from &ORIGINAL_LIST
&AGE is the number of distinct levels for AGE that exist in the DRUG1
&GENDER is the number of distinct levels for GENDER that exist in DRUG1
&TYPE is the number of distinct levels for TYPE that exist in DRUG1
Using macro language only, construct the list that you would like to replace GRP_LST in DRUG1. There should not be another reference to DRUG1 until that list has been constructed as a macro variable.
Thanks but I still have the same problem with using a condition when actual levels are less than expected levels which will determine the final list of variables. i tried using %eval() in my originial code but that didn't work either.
The next step would be to create a new list that is a subset of the candidates for removal based on the expected levels ... using macro language only, create a list of those variable names that should be removed.
If you have trouble getting that to work, show what you have attempted and we can work with it.
Thanks. It is the macro programming that is causing my confusion so I'm not following, sorry. Basically I have
an EXPECTED variable list and associated EXPECTED values in two macro variables (&var_lst, &var_lvl) . I can create a macro variable for the grp_lst - your 'original_list' from my dataset drug1 . I was then trying to go variable by variable to count the levels in a sample dataset (drug1) and if that count was less than the EXPECTED count of that same variable, I would ideally remove it from the list. I see that I was trying to mix macro and real variables which was a problem.
Alos, I call out the individual items in the list by using the %scan(&var_lst, &k.) in a %DO loop so i'd rather not introduce more macro variables as your second proc sql does if I can avoid it.
We can work with your names, if you would prefer. Just based on personal preference, I chose a style that I find easier to read. For example, I actually created macro variables without the "_lvl" at the end, holding the same values as your _lvl versions. But it's no problem using your names. Here's the idea for the next step, constructing a list of those that should be removed:
%let to_be_removed=;
%do k=1 %to &varCount;
%if %scan(&var_lst, &k)_lvl < %scan(&expected, &k) %then %let to_be_removed = &to_be_removed %scan(&var_lst, &k);
%end;
Take a look, and confirm that you're getting the list of variables that don't have enough levels. That doesn't remove them from the original list, but it does isolate the names of those that should be removed. So another step will follow.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
