BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

Let's say I have a list of 6 global macro variables that represent a 4 or 5-digit code (candidatecode_5digit1-candidatecode_5digit6) and a list of 3 global macro variables that represent the 3-digit unique versions of these 6 codes (candidatecode_3digit1_candidatecode_3digit3):

GLOBAL CANDIDATECODE_5DIGIT1 0030

GLOBAL CANDIDATECODE_5DIGIT2 0031

GLOBAL CANDIDATECODE_5DIGIT3 0039

GLOBAL CANDIDATECODE_5DIGIT4 0084

GLOBAL CANDIDATECODE_5DIGIT5 0085

GLOBAL CANDIDATECODE_5DIGIT6 0090

GLOBAL CANDIDATECODE_3DIGIT1 003

GLOBAL CANDIDATECODE_3DIGIT2 008

GLOBAL CANDIDATECODE_3DIGIT3 009

I have a dataset that has 3 variables for the four or five-digit codes and 3 variables for the three-digit version of these codes.  What I'm trying to do is create a flag that equals 1 if, for example, the three-digit code "003" is present in any of the three-digit code variables AND if either "0030", "0031", or "0039" are present in any of the four or five-digit code variables.  The flag would equal zero otherwise (e.g. it would be zero if a code like "0032" was present but not "0030, "0031", or "0039.").

Have:

Obs    Dx_code_5digit1     Dx_code_5digit2          Dx_code_5digit3     Dx_code_3digit1      Dx_code_3digit2         Dx_code_3digit3    

1         23231                    8770                           0030                       232                         877                            003

2         035                       0031                           0032                        035                        003

3         0032                                                        003

4         0239                                                        023

Want:

Obs    Dx_code_5digit1     Dx_code_5digit2          Dx_code_5digit3     Dx_code_3digit1      Dx_code_3digit2         Dx_code_3digit3             Flag

1         23231                    8770                           0030                       232                         877                            003                              1

2         035                       0031                           0032                        035                        003                                                                1

3         0032                                                        003                                                                                                                            0

4         0239                                                        023                                                                                                                            0

How would I implement this using a macro?  Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Add below line in bold to the code I've previously posted - this will fix the issue.

The reason why it didn't work with "00301" is that variable _search_string gets defined with the very first string length passed to it - which happens to be a 4 character string using your data sample. So 00301 got truncated to 0030 and therefore you didn't get a match.

%macro codegen;

  length _search_string $5;

  %let n=%sysfunc(coalesce(&max_n,0));

  %do i=1 %to &n;

    flag&i=0;

    do _search_string=&&_search_string&i while(flag&i=0);

      if whichc(_search_string, of digit5

  • ) then flag&i=1;
  •     end;

      %end;

      drop _search_string;

    %mend;

    View solution in original post

    16 REPLIES 16
    Patrick
    Opal | Level 21

    If it's only about 3 similar blocks of logic then it's may be better to not wrap this into a macro but to keep the code easier to read and debug - even if it comes with a bit more typing.

    See if below example does what you're after.

    data Have(drop=_:);

    infile datalines truncover dsd dlm=',';

    input

      (Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)

      (Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)

      ;

      array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;

      array digit3 {*} Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3;

      flag=0;

      /** start of setting the flag*/

      if whichc('003',of digit3

  • ) then
  •     do;

          do _search_string="0030", "0031", "0039";

            if whichc(_search_string, of digit5

  • ) then
  •           do;

                flag=1;

                go to end_set_flag;

              end;

          end;

        end;

        /* similar blocks of logic for '008' and '009'*/ 

      end_set_flag:;

      /** end of setting the flag **/

      /**

        some other statements

      **/

      datalines;

    23231,8770,0030,232,877,003

    035,0031,0032,035,003

    0032,003

    0239,023

    ;

    run;

    chuakp
    Obsidian | Level 7

    Patrick, thanks for your answer.  In truth, I have 900 4 or 5-digit codes and 321 unique 3-digit codes; I have tried to simplify the question as much as possible to avoid confusion and overwhelming potential readers.  Is it possible to apply whichc to a global macro variable?

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Hi,

    Well, generally speaking , macros are already stored in a dataset in SASHELP.VMACRO.  You could directly query that table where scope="GLOBAL" and name in (you global macro names).

    Scott_Mitchell
    Quartz | Level 8

    You could read your macro variables into a dataset and then load the resulting dataset into a HASH TABLE.

    OPTIONS SYMBOLGEN MPRINT MLOGIC;

    /*MACRO VARIABLES*/

    %LET CANDIDATECODE_5DIGIT1 = 0030;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    %LET CANDIDATECODE_3DIGIT1 = 003;

    %LET CANDIDATECODE_3DIGIT2 = 008;

    %LET CANDIDATECODE_3DIGIT3 = 009;

    /*CONVERT MACRO VARIABLES INTO A DATASET*/

    DATA MACVARS;

    LENGTH VALUES $5;

    DO H = 3,5;

    DO I = 1  BY 1 WHILE (SYMEXIST(CATT("CANDIDATECODE_",H,"DIGIT",I)) = 1);

    _LIST = CATT("CANDIDATECODE_",H,"DIGIT",I);

    VALUES = SYMGET(CATT("CANDIDATECODE_",H,"DIGIT",I));

    OUTPUT;

    END;

    END;

    DROP H I;

    RUN;

    /*CREATE THE DATA YOU WANT TO COMPARE AGAINST*/

    DATA HAVE;

    INFILE DATALINES TRUNCOVER DSD DLM=',';

    INPUT DX_CODE_5DIGIT1 :$5. DX_CODE_5DIGIT2 :$5. DX_CODE_5DIGIT3 :$5. DX_CODE_3DIGIT1 :$5. DX_CODE_3DIGIT2 :$5. DX_CODE_3DIGIT3 :$5.;

    DATALINES;

    23231,8770,0030,232,877,003

    035,0031,0032,035,003

    0032,,003

    0239,,023

    ;

    RUN;

    /*CREATE A HASH TABLE LOADED WITH THE VALUES FROM YOUR MACRO VARIABLES*/

    /*USE HASH ITERATOR TO CYCLE THROUGH THE VALUES IN THE HASH TABLE*/

    DATA WANT;

      LENGTH VALUES $5;

    IF _N_ = 1 THEN DO;

      DECLARE HASH H (DATASET:"MACVARS");

      H.DEFINEKEY ("VALUES");

      H.DEFINEDATA ("VALUES");

      H.DEFINEDONE ();

      DECLARE HITER ITER("H");

    END;

    /*INITIALIZE VARIABLE VALUES OTHERWISE YOU GET WARNING NOTES*/

      CALL MISSING (VALUES);

    /*CREATE AN ARRAY FOR STRINGS WITH 3 CHARACTERS AND ONE FOR THOSE WITH 5 CHARACTERS*/

      ARRAY _DIGITS3 {*} $ DX_CODE_3DIGIT1 - DX_CODE_3DIGIT3;

      ARRAY _DIGITS5 {*} $ DX_CODE_5DIGIT1 - DX_CODE_5DIGIT3;

    /*READ THE DATASET IN CONTAINING THE VALUES */

      SET HAVE;

      FLAG = 0;

    /*GO TO THE FIRST RECORD WITHIN THE HASH TABLE*/

      RC = ITER.FIRST();

      DO I = 1 BY 1 WHILE (RC = 0);

      /*IF THE VALUE FROM THE HASH TABLE IS 3 CHARACTERS IN LENGTH THEN COMPARE THE VALUES WITH THE _DIGITS3 ARRAY*/

      IF LENGTH(TRIM(VALUES)) = 3 THEN DO;

      IF WHICHC(VALUES,OF _DIGITS3{*})> 0

      THEN DO;

      FLAG = 1;

      LEAVE;

      END;

      RC = ITER.NEXT();

      END;

      /*IF THE VALUE FROM THE HASH TABLE IS GREATER THAN 3 CHARACTERS IN LENGTH THEN COMPARE THE VALUES WITH THE _DIGITS5 ARRAY*/

      ELSE IF LENGTH(TRIM(VALUES)) > 3 THEN DO;

      IF WHICHC(VALUES,OF _DIGITS5{*})> 0

      THEN DO;

      FLAG = 1;

      LEAVE;

      END;

      RC = ITER.NEXT();

      END;

      END;

      DROP I RC VALUES;

    RUN;

    chuakp
    Obsidian | Level 7

    Scott, thanks for the response.  This got me on the right track, but I realized I was not sufficiently clear about what I'm trying to do.  Your code took the length of the values in MACVARS and compared to the corresponding 3 or 5-digit arrays.

    1) What I'm trying to do is to set a flag (let's call it "flag1"). that equals 1 if "0030", "0031", or "0039" are present in any of the five-digit code variables, but not if a code like "0032" or "0033" was the only "003" code present.  If "0032" and "0030" were both present, I would want to set flag1 = 1.

    2) Then, I want to create a flag (let's call it "flag2") that equals 1 if "0084" or "0085" are present in dx_code_5digit1-dx_code_5digit3, but not if a code like "0080" was the only "008" code present.

    3) Finally, I want to create a flag (let's call it "flag3") that equals 1 if "0090" are present in dx_code_5digit1-dx_code_5digit3, but not if a code like "0093" was the only "009" code present.


    Let me put together a different hypothetical dataset to illustrate:

    Have:

    Obs    Dx_code_5digit1   Dx_code_5digit2  Dx_code_5digit3     Dx_code_3digit1  Dx_code_3digit2  Dx_code_3digit3

    1         0030                                                                           003

    2         0032                                                                           003

    3         0032                      0030                                              003                    003

    4         0030                      0085                 0091                      003                    008                      009

    5         0030                      0085                 0090                      003                    008                      009

    6         0031                      0080                 0085                      003                    008                      008

    Want:

    Obs    Dx_code_5digit1   Dx_code_5digit2   Dx_code_5digit3     Dx_code_3digit1  Dx_code_3digit2  Dx_code_3digit3   Flag1     Flag2     Flag3

    1         0030                                                                           003                                                                             1            0           0

    2         0032                                                                           003                                                                             0            0           0

    3         0032                      0030                                              003                     003                                                   1            0           0

    4         0030                      0085                 0091                      003                     008                     009                         1            1           0

    5         0030                      0085                 0090                      003                     008                     009                         1            1           1

    6         0031                      0080                 0085                      003                     008                     008                         0            1           0



    Basically, I think I would have to set up a do loop that goes through each of the three-digit candidate codes ("003", "008", and "009"), then iterate through the hash table.  I'm not exactly sure how to do that, though.

    [Note in reality, there are actually 321 three-digit candidate codes and 900 5-digit candidate codes, but I'm simplifying for readability).

    Thanks.

    Patrick
    Opal | Level 21

    Reading through your logic it looks to me as if we don't have to care about the 3 digit codes as all.

    Can you please confirm if below logic is all what this is about?

    flag1=1 in cases where at least one of the the 5 digit codes contains a value of "0030", "0031", or "0039"

    flag2=1 in cases where at least one of the the 5 digit codes contains a value of "0084" or "0085"

    flag3=1 in cases where at least one of the the 5 digit codes contains a value of "0090"

    chuakp
    Obsidian | Level 7

    Patrick, I believe what you wrote is correct.  I'm trying to aggregate all candidate 5-digit codes up to the 3-digit code.  So flag1 refers to whether there is a "003" code that is also one of the 5-digit candidate codes (0031, 0031, and 0039 - but not something like 0032, etc.).  So I do "care" about the 3-digit codes, but only conceptually as a grouping variable.

    Patrick
    Opal | Level 21

    data Have(drop=_:);
      infile datalines truncover dsd dlm=',';
      input
        (Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)
        (Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)
      ;
      array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;
     
      /** flag1=1 in cases where at least one of the the 5 digit codes contains a value of "0030", "0031", or "0039" **/
      flag1=0;
      do _search_string="0030", "0031", "0039" while(flag1=0);
        if whichc(_search_string, of digit5

  • ) then flag1=1;
      end;
  •   /** flag2=1 in cases where at least one of the the 5 digit codes contains a value of "0084" or "0085" **/
      flag2=0;
      do _search_string="0084","0085" while(flag2=0);
        if whichc(_search_string, of digit5

  • ) then flag2=1;
      end;
  •   /** flag3=1 in cases where at least one of the the 5 digit codes contains a value of "0090" **/
      flag3=0;
      do _search_string="0090" while(flag3=0);
        if whichc(_search_string, of digit5

  • ) then flag3=1;
      end;
      datalines;
    0030,,,003
    0032,,,003
    0032,0030,,003,003
    0030,0085,0091,003,008,009
    0030,0085,0090,003,008,009
    0031,0080,0085,003,008,008
    ;
    run;
  • chuakp
    Obsidian | Level 7

    Thanks to both Patrick and slchen for their responses.  I think both of these solutions would work, but I don't want to have to explicitly write out "0030", "0031", and "0039", etc.  I'd like for SAS to automatically search my global macro variables instead of explicitly coding them since there are 900 5-digit codes and 321 3-digit codes.  For instance, I'd like SAS to start with candidatecode_3digit1 (003) and then only search for candidatecode_5digit1-candidatecode_5digit3 (0030, 0031, 0039).  Then SAS would go to 008 and search for 0084 and 0085, etc. 

    %LET CANDIDATECODE_5DIGIT1 = 0030;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    %LET CANDIDATECODE_3DIGIT1 = 003;

    %LET CANDIDATECODE_3DIGIT2 = 008;

    %LET CANDIDATECODE_3DIGIT3 = 009;

    slchen
    Lapis Lazuli | Level 10


    %let c11=0030, 0031, 0039;
    %let c12=003;
    %let c21=0084, 0085;
    %let c22=008;
    %let c31=0090;
    %let c32=009;

    data want;

      set have;

      array code_5 Dx_code_5digit1-Dx_code_5digit3;

      array code_3 Dx_code_3digit1-Dx_code_3digit3;

       do over code_5;

          flag1=(code_5 in (%quote(&c11)) and code_3 in (%quote(&c12)));

          if flag1=1 then leave;

       end;

      do over code_5;

          flag2=(code_5 in (%quote(&c21)) and code_3 in (%quote(&c22)));

          if flag2=1 then leave;

      end;

      do over code_5;

          flag3=(code_5 in (%quote(&c31)) and code_3 in ( %quote(&c32)));

          if flag3=1 then leave;

      end;

      run;

    chuakp
    Obsidian | Level 7

    Thanks, though this still requires me to manually type in c11 = 0030, 0031, 0039, etc.  Is there a way to have SAS automatically generate global macro variables like c11 by searching through my 5-digit global macro variables (candidatecode_5digit1-candidatecode5digit6) and appropriately grouping them together based on the first three digits?

    GLOBAL CANDIDATECODE_5DIGIT1 0030

    GLOBAL CANDIDATECODE_5DIGIT2 0031

    GLOBAL CANDIDATECODE_5DIGIT3 0039

    GLOBAL CANDIDATECODE_5DIGIT4 0084

    GLOBAL CANDIDATECODE_5DIGIT5 0085

    GLOBAL CANDIDATECODE_5DIGIT6 0090

    Patrick
    Opal | Level 21

    You probably would need to show a bit more of your own coding effort to motivate people to help you find a solution. But here you go:

    %LET CANDIDATECODE_5DIGIT1 = 0030;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    proc sql noprint;

      create table codes as

      select value as value5, substrn(value,1,3) as value3

      from dictionary.macros

      where scope='GLOBAL' and name like 'CANDIDATECODE_5DIGIT%'

      order by value3

      ;

    quit;

    data _null_;

      set codes;

      by value3;

      length list $1000;

      retain list;

      list=catx(',',list,cats('"',value5,'"'));

      if last.value3 then

        do;

          n+1;

          call symputx(cats('_search_string',n),list);

          call missing(list);

        end;

      call symputx('max_n',n);

    run;

    %macro codegen;

      %let n=%sysfunc(coalesce(&max_n,0));

      %do i=1 %to &n;

        flag&i=0;

        do _search_string=&&_search_string&i while(flag&i=0);

          if whichc(_search_string, of digit5

  • ) then flag&i=1;
  •     end;

      %end;

      drop _search_string;

    %mend;

    data Have;

      infile datalines truncover dsd dlm=',';

      input

        (Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)

        (Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)

      ;

      array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;

      %codegen;

      datalines;

    0030,,,003

    0032,,,003

    0032,0030,,003,003

    0030,0085,0091,003,008,009

    0030,0085,0090,003,008,009

    0031,0080,0085,003,008,008

    ;

    run;

    chuakp
    Obsidian | Level 7

    Patrick, thanks for this solution - this is very helpful.  I am able to use this code to correctly set the flag for if the search string consists of candidate codes with four digits, but for some reason, the flag does not set correctly if the search string contains any candidate code with five digits.  So for example, if I change your code to the following (see bold), the global macro _search_string1 resolves to "0039","00301","0031".

    %LET CANDIDATECODE_5DIGIT1 = 00301;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    ....

    data Have;

    ...

     

    datalines;

    00301,,,003

    0032,,,003

    0032,0030,,003,003

    0030,0085,0091,003,008,009

    0030,0085,0090,003,008,009

    0031,0080,0085,003,008,008

    ;

    run;

    flag1 sets to 0 for the first observation and sets to 1 for observations 3-6.  However, flag1 should equal 1 for observation 1 (since 00301 is present) and observation 6 (since 0031 is present) but should equal zero for observations 3-5 (0030 is present and 0030 is not part of the search string).  flag2 sets correctly to 1 (because the search string consists of two four-digit codes "0084", "0085") , as does flag 3 (because the search string consists of a four-digit code "0090.").  It's a bit of a weird pattern - SAS appears to be ignoring the fact that 00301 (part of the search code) is present in observation 1 and treating 0030 (not part of the search code) in observations 3-5 as being equal to 00301.

    Any ideas on why this might be the case?  Thanks.

    __________________________________________________________________________________________________________________________

    To address your other comment, I do have code that correctly set the flags but it is terribly inefficient.  This code searches each of the 3 five-digit code variables for any of the 900 five-digit candidate codes and takes about 2 hours to run (compared to 1.5 minutes for your code).

    %let cc3d = candidatecode_3digit;

    %let cc5d = candidatecode_5digit;

    %macro flag;

    %do j = 1 %to 321; /*Counter for the 321 3-digit codes*/

      data newflag&j; set maindatafile;

      array dx_5digit {*} $ dx_code_5digit1-dx_code_5digit3;

      array dx_3digit {*} $ dx_code_3digit1-dx_code_3digit3;

      flag&j = 0;

      %do k = 1 %to &nobs_codes_5digit; /*Counter for 900 5-digit codes*/

               %do i = 1 %to 3;

                     if dx_3digit(&i) = "&&&cc3d&j." then do;

                          if dx_5digit(&i) = "&&&cc5d&k." then flag&j = 1;

                     end;

                %end;

      %end;

      run;

    %end;

    %mend flag;

    %flag;

    Patrick
    Opal | Level 21

    Add below line in bold to the code I've previously posted - this will fix the issue.

    The reason why it didn't work with "00301" is that variable _search_string gets defined with the very first string length passed to it - which happens to be a 4 character string using your data sample. So 00301 got truncated to 0030 and therefore you didn't get a match.

    %macro codegen;

      length _search_string $5;

      %let n=%sysfunc(coalesce(&max_n,0));

      %do i=1 %to &n;

        flag&i=0;

        do _search_string=&&_search_string&i while(flag&i=0);

          if whichc(_search_string, of digit5

  • ) then flag&i=1;
  •     end;

      %end;

      drop _search_string;

    %mend;

    SAS Innovate 2025: Register Now

    Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
    Sign up by Dec. 31 to get the 2024 rate of just $495.
    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 16 replies
    • 2275 views
    • 3 likes
    • 5 in conversation