Let's say I have a list of 6 global macro variables that represent a 4 or 5-digit code (candidatecode_5digit1-candidatecode_5digit6) and a list of 3 global macro variables that represent the 3-digit unique versions of these 6 codes (candidatecode_3digit1_candidatecode_3digit3):
GLOBAL CANDIDATECODE_5DIGIT1 0030
GLOBAL CANDIDATECODE_5DIGIT2 0031
GLOBAL CANDIDATECODE_5DIGIT3 0039
GLOBAL CANDIDATECODE_5DIGIT4 0084
GLOBAL CANDIDATECODE_5DIGIT5 0085
GLOBAL CANDIDATECODE_5DIGIT6 0090
GLOBAL CANDIDATECODE_3DIGIT1 003
GLOBAL CANDIDATECODE_3DIGIT2 008
GLOBAL CANDIDATECODE_3DIGIT3 009
I have a dataset that has 3 variables for the four or five-digit codes and 3 variables for the three-digit version of these codes. What I'm trying to do is create a flag that equals 1 if, for example, the three-digit code "003" is present in any of the three-digit code variables AND if either "0030", "0031", or "0039" are present in any of the four or five-digit code variables. The flag would equal zero otherwise (e.g. it would be zero if a code like "0032" was present but not "0030, "0031", or "0039.").
Have:
Obs Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3 Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3
1 23231 8770 0030 232 877 003
2 035 0031 0032 035 003
3 0032 003
4 0239 023
Want:
Obs Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3 Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3 Flag
1 23231 8770 0030 232 877 003 1
2 035 0031 0032 035 003 1
3 0032 003 0
4 0239 023 0
How would I implement this using a macro? Thanks.
Add below line in bold to the code I've previously posted - this will fix the issue.
The reason why it didn't work with "00301" is that variable _search_string gets defined with the very first string length passed to it - which happens to be a 4 character string using your data sample. So 00301 got truncated to 0030 and therefore you didn't get a match.
%macro codegen;
length _search_string $5;
%let n=%sysfunc(coalesce(&max_n,0));
%do i=1 %to &n;
flag&i=0;
do _search_string=&&_search_string&i while(flag&i=0);
if whichc(_search_string, of digit5
end;
%end;
drop _search_string;
%mend;
If it's only about 3 similar blocks of logic then it's may be better to not wrap this into a macro but to keep the code easier to read and debug - even if it comes with a bit more typing.
See if below example does what you're after.
data Have(drop=_:);
infile datalines truncover dsd dlm=',';
input
(Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)
(Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)
;
array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;
array digit3 {*} Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3;
flag=0;
/** start of setting the flag*/
if whichc('003',of digit3
do;
do _search_string="0030", "0031", "0039";
if whichc(_search_string, of digit5
do;
flag=1;
go to end_set_flag;
end;
end;
end;
/* similar blocks of logic for '008' and '009'*/
end_set_flag:;
/** end of setting the flag **/
/**
some other statements
**/
datalines;
23231,8770,0030,232,877,003
035,0031,0032,035,003
0032,003
0239,023
;
run;
Patrick, thanks for your answer. In truth, I have 900 4 or 5-digit codes and 321 unique 3-digit codes; I have tried to simplify the question as much as possible to avoid confusion and overwhelming potential readers. Is it possible to apply whichc to a global macro variable?
Hi,
Well, generally speaking , macros are already stored in a dataset in SASHELP.VMACRO. You could directly query that table where scope="GLOBAL" and name in (you global macro names).
You could read your macro variables into a dataset and then load the resulting dataset into a HASH TABLE.
OPTIONS SYMBOLGEN MPRINT MLOGIC;
/*MACRO VARIABLES*/
%LET CANDIDATECODE_5DIGIT1 = 0030;
%LET CANDIDATECODE_5DIGIT2 = 0031;
%LET CANDIDATECODE_5DIGIT3 = 0039;
%LET CANDIDATECODE_5DIGIT4 = 0084;
%LET CANDIDATECODE_5DIGIT5 = 0085;
%LET CANDIDATECODE_5DIGIT6 = 0090;
%LET CANDIDATECODE_3DIGIT1 = 003;
%LET CANDIDATECODE_3DIGIT2 = 008;
%LET CANDIDATECODE_3DIGIT3 = 009;
/*CONVERT MACRO VARIABLES INTO A DATASET*/
DATA MACVARS;
LENGTH VALUES $5;
DO H = 3,5;
DO I = 1 BY 1 WHILE (SYMEXIST(CATT("CANDIDATECODE_",H,"DIGIT",I)) = 1);
_LIST = CATT("CANDIDATECODE_",H,"DIGIT",I);
VALUES = SYMGET(CATT("CANDIDATECODE_",H,"DIGIT",I));
OUTPUT;
END;
END;
DROP H I;
RUN;
/*CREATE THE DATA YOU WANT TO COMPARE AGAINST*/
DATA HAVE;
INFILE DATALINES TRUNCOVER DSD DLM=',';
INPUT DX_CODE_5DIGIT1 :$5. DX_CODE_5DIGIT2 :$5. DX_CODE_5DIGIT3 :$5. DX_CODE_3DIGIT1 :$5. DX_CODE_3DIGIT2 :$5. DX_CODE_3DIGIT3 :$5.;
DATALINES;
23231,8770,0030,232,877,003
035,0031,0032,035,003
0032,,003
0239,,023
;
RUN;
/*CREATE A HASH TABLE LOADED WITH THE VALUES FROM YOUR MACRO VARIABLES*/
/*USE HASH ITERATOR TO CYCLE THROUGH THE VALUES IN THE HASH TABLE*/
DATA WANT;
LENGTH VALUES $5;
IF _N_ = 1 THEN DO;
DECLARE HASH H (DATASET:"MACVARS");
H.DEFINEKEY ("VALUES");
H.DEFINEDATA ("VALUES");
H.DEFINEDONE ();
DECLARE HITER ITER("H");
END;
/*INITIALIZE VARIABLE VALUES OTHERWISE YOU GET WARNING NOTES*/
CALL MISSING (VALUES);
/*CREATE AN ARRAY FOR STRINGS WITH 3 CHARACTERS AND ONE FOR THOSE WITH 5 CHARACTERS*/
ARRAY _DIGITS3 {*} $ DX_CODE_3DIGIT1 - DX_CODE_3DIGIT3;
ARRAY _DIGITS5 {*} $ DX_CODE_5DIGIT1 - DX_CODE_5DIGIT3;
/*READ THE DATASET IN CONTAINING THE VALUES */
SET HAVE;
FLAG = 0;
/*GO TO THE FIRST RECORD WITHIN THE HASH TABLE*/
RC = ITER.FIRST();
DO I = 1 BY 1 WHILE (RC = 0);
/*IF THE VALUE FROM THE HASH TABLE IS 3 CHARACTERS IN LENGTH THEN COMPARE THE VALUES WITH THE _DIGITS3 ARRAY*/
IF LENGTH(TRIM(VALUES)) = 3 THEN DO;
IF WHICHC(VALUES,OF _DIGITS3{*})> 0
THEN DO;
FLAG = 1;
LEAVE;
END;
RC = ITER.NEXT();
END;
/*IF THE VALUE FROM THE HASH TABLE IS GREATER THAN 3 CHARACTERS IN LENGTH THEN COMPARE THE VALUES WITH THE _DIGITS5 ARRAY*/
ELSE IF LENGTH(TRIM(VALUES)) > 3 THEN DO;
IF WHICHC(VALUES,OF _DIGITS5{*})> 0
THEN DO;
FLAG = 1;
LEAVE;
END;
RC = ITER.NEXT();
END;
END;
DROP I RC VALUES;
RUN;
Scott, thanks for the response. This got me on the right track, but I realized I was not sufficiently clear about what I'm trying to do. Your code took the length of the values in MACVARS and compared to the corresponding 3 or 5-digit arrays.
1) What I'm trying to do is to set a flag (let's call it "flag1"). that equals 1 if "0030", "0031", or "0039" are present in any of the five-digit code variables, but not if a code like "0032" or "0033" was the only "003" code present. If "0032" and "0030" were both present, I would want to set flag1 = 1.
2) Then, I want to create a flag (let's call it "flag2") that equals 1 if "0084" or "0085" are present in dx_code_5digit1-dx_code_5digit3, but not if a code like "0080" was the only "008" code present.
3) Finally, I want to create a flag (let's call it "flag3") that equals 1 if "0090" are present in dx_code_5digit1-dx_code_5digit3, but not if a code like "0093" was the only "009" code present.
Let me put together a different hypothetical dataset to illustrate:
Have:
Obs Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3 Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3
1 0030 003
2 0032 003
3 0032 0030 003 003
4 0030 0085 0091 003 008 009
5 0030 0085 0090 003 008 009
6 0031 0080 0085 003 008 008
Want:
Obs Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3 Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3 Flag1 Flag2 Flag3
1 0030 003 1 0 0
2 0032 003 0 0 0
3 0032 0030 003 003 1 0 0
4 0030 0085 0091 003 008 009 1 1 0
5 0030 0085 0090 003 008 009 1 1 1
6 0031 0080 0085 003 008 008 0 1 0
Basically, I think I would have to set up a do loop that goes through each of the three-digit candidate codes ("003", "008", and "009"), then iterate through the hash table. I'm not exactly sure how to do that, though.
[Note in reality, there are actually 321 three-digit candidate codes and 900 5-digit candidate codes, but I'm simplifying for readability).
Thanks.
Reading through your logic it looks to me as if we don't have to care about the 3 digit codes as all.
Can you please confirm if below logic is all what this is about?
flag1=1 in cases where at least one of the the 5 digit codes contains a value of "0030", "0031", or "0039"
flag2=1 in cases where at least one of the the 5 digit codes contains a value of "0084" or "0085"
flag3=1 in cases where at least one of the the 5 digit codes contains a value of "0090"
Patrick, I believe what you wrote is correct. I'm trying to aggregate all candidate 5-digit codes up to the 3-digit code. So flag1 refers to whether there is a "003" code that is also one of the 5-digit candidate codes (0031, 0031, and 0039 - but not something like 0032, etc.). So I do "care" about the 3-digit codes, but only conceptually as a grouping variable.
data Have(drop=_:);
infile datalines truncover dsd dlm=',';
input
(Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)
(Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)
;
array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;
/** flag1=1 in cases where at least one of the the 5 digit codes contains a value of "0030", "0031", or "0039" **/
flag1=0;
do _search_string="0030", "0031", "0039" while(flag1=0);
if whichc(_search_string, of digit5
/** flag2=1 in cases where at least one of the the 5 digit codes contains a value of "0084" or "0085" **/
flag2=0;
do _search_string="0084","0085" while(flag2=0);
if whichc(_search_string, of digit5
/** flag3=1 in cases where at least one of the the 5 digit codes contains a value of "0090" **/
flag3=0;
do _search_string="0090" while(flag3=0);
if whichc(_search_string, of digit5
Thanks to both Patrick and slchen for their responses. I think both of these solutions would work, but I don't want to have to explicitly write out "0030", "0031", and "0039", etc. I'd like for SAS to automatically search my global macro variables instead of explicitly coding them since there are 900 5-digit codes and 321 3-digit codes. For instance, I'd like SAS to start with candidatecode_3digit1 (003) and then only search for candidatecode_5digit1-candidatecode_5digit3 (0030, 0031, 0039). Then SAS would go to 008 and search for 0084 and 0085, etc.
%LET CANDIDATECODE_5DIGIT1 = 0030;
%LET CANDIDATECODE_5DIGIT2 = 0031;
%LET CANDIDATECODE_5DIGIT3 = 0039;
%LET CANDIDATECODE_5DIGIT4 = 0084;
%LET CANDIDATECODE_5DIGIT5 = 0085;
%LET CANDIDATECODE_5DIGIT6 = 0090;
%LET CANDIDATECODE_3DIGIT1 = 003;
%LET CANDIDATECODE_3DIGIT2 = 008;
%LET CANDIDATECODE_3DIGIT3 = 009;
%let c11=0030, 0031, 0039;
%let c12=003;
%let c21=0084, 0085;
%let c22=008;
%let c31=0090;
%let c32=009;
data want;
set have;
array code_5 Dx_code_5digit1-Dx_code_5digit3;
array code_3 Dx_code_3digit1-Dx_code_3digit3;
do over code_5;
flag1=(code_5 in (%quote(&c11)) and code_3 in (%quote(&c12)));
if flag1=1 then leave;
end;
do over code_5;
flag2=(code_5 in (%quote(&c21)) and code_3 in (%quote(&c22)));
if flag2=1 then leave;
end;
do over code_5;
flag3=(code_5 in (%quote(&c31)) and code_3 in ( %quote(&c32)));
if flag3=1 then leave;
end;
run;
Thanks, though this still requires me to manually type in c11 = 0030, 0031, 0039, etc. Is there a way to have SAS automatically generate global macro variables like c11 by searching through my 5-digit global macro variables (candidatecode_5digit1-candidatecode5digit6) and appropriately grouping them together based on the first three digits?
GLOBAL CANDIDATECODE_5DIGIT1 0030
GLOBAL CANDIDATECODE_5DIGIT2 0031
GLOBAL CANDIDATECODE_5DIGIT3 0039
GLOBAL CANDIDATECODE_5DIGIT4 0084
GLOBAL CANDIDATECODE_5DIGIT5 0085
GLOBAL CANDIDATECODE_5DIGIT6 0090
You probably would need to show a bit more of your own coding effort to motivate people to help you find a solution. But here you go:
%LET CANDIDATECODE_5DIGIT1 = 0030;
%LET CANDIDATECODE_5DIGIT2 = 0031;
%LET CANDIDATECODE_5DIGIT3 = 0039;
%LET CANDIDATECODE_5DIGIT4 = 0084;
%LET CANDIDATECODE_5DIGIT5 = 0085;
%LET CANDIDATECODE_5DIGIT6 = 0090;
proc sql noprint;
create table codes as
select value as value5, substrn(value,1,3) as value3
from dictionary.macros
where scope='GLOBAL' and name like 'CANDIDATECODE_5DIGIT%'
order by value3
;
quit;
data _null_;
set codes;
by value3;
length list $1000;
retain list;
list=catx(',',list,cats('"',value5,'"'));
if last.value3 then
do;
n+1;
call symputx(cats('_search_string',n),list);
call missing(list);
end;
call symputx('max_n',n);
run;
%macro codegen;
%let n=%sysfunc(coalesce(&max_n,0));
%do i=1 %to &n;
flag&i=0;
do _search_string=&&_search_string&i while(flag&i=0);
if whichc(_search_string, of digit5
end;
%end;
drop _search_string;
%mend;
data Have;
infile datalines truncover dsd dlm=',';
input
(Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)
(Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)
;
array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;
%codegen;
datalines;
0030,,,003
0032,,,003
0032,0030,,003,003
0030,0085,0091,003,008,009
0030,0085,0090,003,008,009
0031,0080,0085,003,008,008
;
run;
Patrick, thanks for this solution - this is very helpful. I am able to use this code to correctly set the flag for if the search string consists of candidate codes with four digits, but for some reason, the flag does not set correctly if the search string contains any candidate code with five digits. So for example, if I change your code to the following (see bold), the global macro _search_string1 resolves to "0039","00301","0031".
%LET CANDIDATECODE_5DIGIT1 = 00301;
%LET CANDIDATECODE_5DIGIT2 = 0031;
%LET CANDIDATECODE_5DIGIT3 = 0039;
%LET CANDIDATECODE_5DIGIT4 = 0084;
%LET CANDIDATECODE_5DIGIT5 = 0085;
%LET CANDIDATECODE_5DIGIT6 = 0090;
....
data Have;
...
datalines;
00301,,,003
0032,,,003
0032,0030,,003,003
0030,0085,0091,003,008,009
0030,0085,0090,003,008,009
0031,0080,0085,003,008,008
;
run;
flag1 sets to 0 for the first observation and sets to 1 for observations 3-6. However, flag1 should equal 1 for observation 1 (since 00301 is present) and observation 6 (since 0031 is present) but should equal zero for observations 3-5 (0030 is present and 0030 is not part of the search string). flag2 sets correctly to 1 (because the search string consists of two four-digit codes "0084", "0085") , as does flag 3 (because the search string consists of a four-digit code "0090."). It's a bit of a weird pattern - SAS appears to be ignoring the fact that 00301 (part of the search code) is present in observation 1 and treating 0030 (not part of the search code) in observations 3-5 as being equal to 00301.
Any ideas on why this might be the case? Thanks.
__________________________________________________________________________________________________________________________
To address your other comment, I do have code that correctly set the flags but it is terribly inefficient. This code searches each of the 3 five-digit code variables for any of the 900 five-digit candidate codes and takes about 2 hours to run (compared to 1.5 minutes for your code).
%let cc3d = candidatecode_3digit;
%let cc5d = candidatecode_5digit;
%macro flag;
%do j = 1 %to 321; /*Counter for the 321 3-digit codes*/
data newflag&j; set maindatafile;
array dx_5digit {*} $ dx_code_5digit1-dx_code_5digit3;
array dx_3digit {*} $ dx_code_3digit1-dx_code_3digit3;
flag&j = 0;
%do k = 1 %to &nobs_codes_5digit; /*Counter for 900 5-digit codes*/
%do i = 1 %to 3;
if dx_3digit(&i) = "&&&cc3d&j." then do;
if dx_5digit(&i) = "&&&cc5d&k." then flag&j = 1;
end;
%end;
%end;
run;
%end;
%mend flag;
%flag;
Add below line in bold to the code I've previously posted - this will fix the issue.
The reason why it didn't work with "00301" is that variable _search_string gets defined with the very first string length passed to it - which happens to be a 4 character string using your data sample. So 00301 got truncated to 0030 and therefore you didn't get a match.
%macro codegen;
length _search_string $5;
%let n=%sysfunc(coalesce(&max_n,0));
%do i=1 %to &n;
flag&i=0;
do _search_string=&&_search_string&i while(flag&i=0);
if whichc(_search_string, of digit5
end;
%end;
drop _search_string;
%mend;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.