DATA Step, Macro, Functions and more

Searching multiple lists of global macro variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Searching multiple lists of global macro variables

Let's say I have a list of 6 global macro variables that represent a 4 or 5-digit code (candidatecode_5digit1-candidatecode_5digit6) and a list of 3 global macro variables that represent the 3-digit unique versions of these 6 codes (candidatecode_3digit1_candidatecode_3digit3):

GLOBAL CANDIDATECODE_5DIGIT1 0030

GLOBAL CANDIDATECODE_5DIGIT2 0031

GLOBAL CANDIDATECODE_5DIGIT3 0039

GLOBAL CANDIDATECODE_5DIGIT4 0084

GLOBAL CANDIDATECODE_5DIGIT5 0085

GLOBAL CANDIDATECODE_5DIGIT6 0090

GLOBAL CANDIDATECODE_3DIGIT1 003

GLOBAL CANDIDATECODE_3DIGIT2 008

GLOBAL CANDIDATECODE_3DIGIT3 009

I have a dataset that has 3 variables for the four or five-digit codes and 3 variables for the three-digit version of these codes.  What I'm trying to do is create a flag that equals 1 if, for example, the three-digit code "003" is present in any of the three-digit code variables AND if either "0030", "0031", or "0039" are present in any of the four or five-digit code variables.  The flag would equal zero otherwise (e.g. it would be zero if a code like "0032" was present but not "0030, "0031", or "0039.").

Have:

Obs    Dx_code_5digit1     Dx_code_5digit2          Dx_code_5digit3     Dx_code_3digit1      Dx_code_3digit2         Dx_code_3digit3    

1         23231                    8770                           0030                       232                         877                            003

2         035                       0031                           0032                        035                        003

3         0032                                                        003

4         0239                                                        023

Want:

Obs    Dx_code_5digit1     Dx_code_5digit2          Dx_code_5digit3     Dx_code_3digit1      Dx_code_3digit2         Dx_code_3digit3             Flag

1         23231                    8770                           0030                       232                         877                            003                              1

2         035                       0031                           0032                        035                        003                                                                1

3         0032                                                        003                                                                                                                            0

4         0239                                                        023                                                                                                                            0

How would I implement this using a macro?  Thanks.


Accepted Solutions
Solution
‎06-26-2014 02:45 AM
Respected Advisor
Posts: 3,894

Re: Searching multiple lists of global macro variables

Add below line in bold to the code I've previously posted - this will fix the issue.

The reason why it didn't work with "00301" is that variable _search_string gets defined with the very first string length passed to it - which happens to be a 4 character string using your data sample. So 00301 got truncated to 0030 and therefore you didn't get a match.

%macro codegen;

  length _search_string $5;

  %let n=%sysfunc(coalesce(&max_n,0));

  %do i=1 %to &n;

    flag&i=0;

    do _search_string=&&_search_string&i while(flag&i=0);

      if whichc(_search_string, of digit5

  • ) then flag&i=1;
  •     end;

      %end;

      drop _search_string;

    %mend;

    View solution in original post


    All Replies
    Respected Advisor
    Posts: 3,894

    Re: Searching lists of global macro variables

    If it's only about 3 similar blocks of logic then it's may be better to not wrap this into a macro but to keep the code easier to read and debug - even if it comes with a bit more typing.

    See if below example does what you're after.

    data Have(drop=_Smiley Happy;

    infile datalines truncover dsd dlm=',';

    input

      (Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)

      (Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)

      ;

      array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;

      array digit3 {*} Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3;

      flag=0;

      /** start of setting the flag*/

      if whichc('003',of digit3

  • ) then
  •     do;

          do _search_string="0030", "0031", "0039";

            if whichc(_search_string, of digit5

  • ) then
  •           do;

                flag=1;

                go to end_set_flag;

              end;

          end;

        end;

        /* similar blocks of logic for '008' and '009'*/ 

      end_set_flag:;

      /** end of setting the flag **/

      /**

        some other statements

      **/

      datalines;

    23231,8770,0030,232,877,003

    035,0031,0032,035,003

    0032,003

    0239,023

    ;

    run;

    Frequent Contributor
    Posts: 82

    Re: Searching lists of global macro variables

    Patrick, thanks for your answer.  In truth, I have 900 4 or 5-digit codes and 321 unique 3-digit codes; I have tried to simplify the question as much as possible to avoid confusion and overwhelming potential readers.  Is it possible to apply whichc to a global macro variable?

    Super User
    Super User
    Posts: 7,401

    Re: Searching lists of global macro variables

    Hi,

    Well, generally speaking , macros are already stored in a dataset in SASHELP.VMACRO.  You could directly query that table where scope="GLOBAL" and name in (you global macro names).

    Super Contributor
    Posts: 297

    Re: Searching multiple lists of global macro variables

    You could read your macro variables into a dataset and then load the resulting dataset into a HASH TABLE.

    OPTIONS SYMBOLGEN MPRINT MLOGIC;

    /*MACRO VARIABLES*/

    %LET CANDIDATECODE_5DIGIT1 = 0030;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    %LET CANDIDATECODE_3DIGIT1 = 003;

    %LET CANDIDATECODE_3DIGIT2 = 008;

    %LET CANDIDATECODE_3DIGIT3 = 009;

    /*CONVERT MACRO VARIABLES INTO A DATASET*/

    DATA MACVARS;

    LENGTH VALUES $5;

    DO H = 3,5;

    DO I = 1  BY 1 WHILE (SYMEXIST(CATT("CANDIDATECODE_",H,"DIGIT",I)) = 1);

    _LIST = CATT("CANDIDATECODE_",H,"DIGIT",I);

    VALUES = SYMGET(CATT("CANDIDATECODE_",H,"DIGIT",I));

    OUTPUT;

    END;

    END;

    DROP H I;

    RUN;

    /*CREATE THE DATA YOU WANT TO COMPARE AGAINST*/

    DATA HAVE;

    INFILE DATALINES TRUNCOVER DSD DLM=',';

    INPUT DX_CODE_5DIGIT1 :$5. DX_CODE_5DIGIT2 :$5. DX_CODE_5DIGIT3 :$5. DX_CODE_3DIGIT1 :$5. DX_CODE_3DIGIT2 :$5. DX_CODE_3DIGIT3 :$5.;

    DATALINES;

    23231,8770,0030,232,877,003

    035,0031,0032,035,003

    0032,,003

    0239,,023

    ;

    RUN;

    /*CREATE A HASH TABLE LOADED WITH THE VALUES FROM YOUR MACRO VARIABLES*/

    /*USE HASH ITERATOR TO CYCLE THROUGH THE VALUES IN THE HASH TABLE*/

    DATA WANT;

      LENGTH VALUES $5;

    IF _N_ = 1 THEN DO;

      DECLARE HASH H (DATASET:"MACVARS");

      H.DEFINEKEY ("VALUES");

      H.DEFINEDATA ("VALUES");

      H.DEFINEDONE ();

      DECLARE HITER ITER("H");

    END;

    /*INITIALIZE VARIABLE VALUES OTHERWISE YOU GET WARNING NOTES*/

      CALL MISSING (VALUES);

    /*CREATE AN ARRAY FOR STRINGS WITH 3 CHARACTERS AND ONE FOR THOSE WITH 5 CHARACTERS*/

      ARRAY _DIGITS3 {*} $ DX_CODE_3DIGIT1 - DX_CODE_3DIGIT3;

      ARRAY _DIGITS5 {*} $ DX_CODE_5DIGIT1 - DX_CODE_5DIGIT3;

    /*READ THE DATASET IN CONTAINING THE VALUES */

      SET HAVE;

      FLAG = 0;

    /*GO TO THE FIRST RECORD WITHIN THE HASH TABLE*/

      RC = ITER.FIRST();

      DO I = 1 BY 1 WHILE (RC = 0);

      /*IF THE VALUE FROM THE HASH TABLE IS 3 CHARACTERS IN LENGTH THEN COMPARE THE VALUES WITH THE _DIGITS3 ARRAY*/

      IF LENGTH(TRIM(VALUES)) = 3 THEN DO;

      IF WHICHC(VALUES,OF _DIGITS3{*})> 0

      THEN DO;

      FLAG = 1;

      LEAVE;

      END;

      RC = ITER.NEXT();

      END;

      /*IF THE VALUE FROM THE HASH TABLE IS GREATER THAN 3 CHARACTERS IN LENGTH THEN COMPARE THE VALUES WITH THE _DIGITS5 ARRAY*/

      ELSE IF LENGTH(TRIM(VALUES)) > 3 THEN DO;

      IF WHICHC(VALUES,OF _DIGITS5{*})> 0

      THEN DO;

      FLAG = 1;

      LEAVE;

      END;

      RC = ITER.NEXT();

      END;

      END;

      DROP I RC VALUES;

    RUN;

    Frequent Contributor
    Posts: 82

    Re: Searching multiple lists of global macro variables

    Scott, thanks for the response.  This got me on the right track, but I realized I was not sufficiently clear about what I'm trying to do.  Your code took the length of the values in MACVARS and compared to the corresponding 3 or 5-digit arrays.

    1) What I'm trying to do is to set a flag (let's call it "flag1"). that equals 1 if "0030", "0031", or "0039" are present in any of the five-digit code variables, but not if a code like "0032" or "0033" was the only "003" code present.  If "0032" and "0030" were both present, I would want to set flag1 = 1.

    2) Then, I want to create a flag (let's call it "flag2") that equals 1 if "0084" or "0085" are present in dx_code_5digit1-dx_code_5digit3, but not if a code like "0080" was the only "008" code present.

    3) Finally, I want to create a flag (let's call it "flag3") that equals 1 if "0090" are present in dx_code_5digit1-dx_code_5digit3, but not if a code like "0093" was the only "009" code present.


    Let me put together a different hypothetical dataset to illustrate:

    Have:

    Obs    Dx_code_5digit1   Dx_code_5digit2  Dx_code_5digit3     Dx_code_3digit1  Dx_code_3digit2  Dx_code_3digit3

    1         0030                                                                           003

    2         0032                                                                           003

    3         0032                      0030                                              003                    003

    4         0030                      0085                 0091                      003                    008                      009

    5         0030                      0085                 0090                      003                    008                      009

    6         0031                      0080                 0085                      003                    008                      008

    Want:

    Obs    Dx_code_5digit1   Dx_code_5digit2   Dx_code_5digit3     Dx_code_3digit1  Dx_code_3digit2  Dx_code_3digit3   Flag1     Flag2     Flag3

    1         0030                                                                           003                                                                             1            0           0

    2         0032                                                                           003                                                                             0            0           0

    3         0032                      0030                                              003                     003                                                   1            0           0

    4         0030                      0085                 0091                      003                     008                     009                         1            1           0

    5         0030                      0085                 0090                      003                     008                     009                         1            1           1

    6         0031                      0080                 0085                      003                     008                     008                         0            1           0



    Basically, I think I would have to set up a do loop that goes through each of the three-digit candidate codes ("003", "008", and "009"), then iterate through the hash table.  I'm not exactly sure how to do that, though.

    [Note in reality, there are actually 321 three-digit candidate codes and 900 5-digit candidate codes, but I'm simplifying for readability).

    Thanks.

    Respected Advisor
    Posts: 3,894

    Re: Searching multiple lists of global macro variables

    Reading through your logic it looks to me as if we don't have to care about the 3 digit codes as all.

    Can you please confirm if below logic is all what this is about?

    flag1=1 in cases where at least one of the the 5 digit codes contains a value of "0030", "0031", or "0039"

    flag2=1 in cases where at least one of the the 5 digit codes contains a value of "0084" or "0085"

    flag3=1 in cases where at least one of the the 5 digit codes contains a value of "0090"

    Frequent Contributor
    Posts: 82

    Re: Searching multiple lists of global macro variables

    Patrick, I believe what you wrote is correct.  I'm trying to aggregate all candidate 5-digit codes up to the 3-digit code.  So flag1 refers to whether there is a "003" code that is also one of the 5-digit candidate codes (0031, 0031, and 0039 - but not something like 0032, etc.).  So I do "care" about the 3-digit codes, but only conceptually as a grouping variable.

    Respected Advisor
    Posts: 3,894

    Re: Searching multiple lists of global macro variables

    data Have(drop=_Smiley Happy;
      infile datalines truncover dsd dlm=',';
      input
        (Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)
        (Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)
      ;
      array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;
     
      /** flag1=1 in cases where at least one of the the 5 digit codes contains a value of "0030", "0031", or "0039" **/
      flag1=0;
      do _search_string="0030", "0031", "0039" while(flag1=0);
        if whichc(_search_string, of digit5

  • ) then flag1=1;
      end;
  •   /** flag2=1 in cases where at least one of the the 5 digit codes contains a value of "0084" or "0085" **/
      flag2=0;
      do _search_string="0084","0085" while(flag2=0);
        if whichc(_search_string, of digit5

  • ) then flag2=1;
      end;
  •   /** flag3=1 in cases where at least one of the the 5 digit codes contains a value of "0090" **/
      flag3=0;
      do _search_string="0090" while(flag3=0);
        if whichc(_search_string, of digit5

  • ) then flag3=1;
      end;
      datalines;
    0030,,,003
    0032,,,003
    0032,0030,,003,003
    0030,0085,0091,003,008,009
    0030,0085,0090,003,008,009
    0031,0080,0085,003,008,008
    ;
    run;
  • Frequent Contributor
    Posts: 82

    Re: Searching multiple lists of global macro variables

    Thanks to both Patrick and slchen for their responses.  I think both of these solutions would work, but I don't want to have to explicitly write out "0030", "0031", and "0039", etc.  I'd like for SAS to automatically search my global macro variables instead of explicitly coding them since there are 900 5-digit codes and 321 3-digit codes.  For instance, I'd like SAS to start with candidatecode_3digit1 (003) and then only search for candidatecode_5digit1-candidatecode_5digit3 (0030, 0031, 0039).  Then SAS would go to 008 and search for 0084 and 0085, etc. 

    %LET CANDIDATECODE_5DIGIT1 = 0030;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    %LET CANDIDATECODE_3DIGIT1 = 003;

    %LET CANDIDATECODE_3DIGIT2 = 008;

    %LET CANDIDATECODE_3DIGIT3 = 009;

    Super Contributor
    Posts: 275

    Re: Searching multiple lists of global macro variables


    %let c11=0030, 0031, 0039;
    %let c12=003;
    %let c21=0084, 0085;
    %let c22=008;
    %let c31=0090;
    %let c32=009;

    data want;

      set have;

      array code_5 Dx_code_5digit1-Dx_code_5digit3;

      array code_3 Dx_code_3digit1-Dx_code_3digit3;

       do over code_5;

          flag1=(code_5 in (%quote(&c11)) and code_3 in (%quote(&c12)));

          if flag1=1 then leave;

       end;

      do over code_5;

          flag2=(code_5 in (%quote(&c21)) and code_3 in (%quote(&c22)));

          if flag2=1 then leave;

      end;

      do over code_5;

          flag3=(code_5 in (%quote(&c31)) and code_3 in ( %quote(&c32)));

          if flag3=1 then leave;

      end;

      run;

    Frequent Contributor
    Posts: 82

    Re: Searching multiple lists of global macro variables

    Thanks, though this still requires me to manually type in c11 = 0030, 0031, 0039, etc.  Is there a way to have SAS automatically generate global macro variables like c11 by searching through my 5-digit global macro variables (candidatecode_5digit1-candidatecode5digit6) and appropriately grouping them together based on the first three digits?

    GLOBAL CANDIDATECODE_5DIGIT1 0030

    GLOBAL CANDIDATECODE_5DIGIT2 0031

    GLOBAL CANDIDATECODE_5DIGIT3 0039

    GLOBAL CANDIDATECODE_5DIGIT4 0084

    GLOBAL CANDIDATECODE_5DIGIT5 0085

    GLOBAL CANDIDATECODE_5DIGIT6 0090

    Respected Advisor
    Posts: 3,894

    Re: Searching multiple lists of global macro variables

    You probably would need to show a bit more of your own coding effort to motivate people to help you find a solution. But here you go:

    %LET CANDIDATECODE_5DIGIT1 = 0030;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    proc sql noprint;

      create table codes as

      select value as value5, substrn(value,1,3) as value3

      from dictionary.macros

      where scope='GLOBAL' and name like 'CANDIDATECODE_5DIGIT%'

      order by value3

      ;

    quit;

    data _null_;

      set codes;

      by value3;

      length list $1000;

      retain list;

      list=catx(',',list,cats('"',value5,'"'));

      if last.value3 then

        do;

          n+1;

          call symputx(cats('_search_string',n),list);

          call missing(list);

        end;

      call symputx('max_n',n);

    run;

    %macro codegen;

      %let n=%sysfunc(coalesce(&max_n,0));

      %do i=1 %to &n;

        flag&i=0;

        do _search_string=&&_search_string&i while(flag&i=0);

          if whichc(_search_string, of digit5

  • ) then flag&i=1;
  •     end;

      %end;

      drop _search_string;

    %mend;

    data Have;

      infile datalines truncover dsd dlm=',';

      input

        (Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3) (:$5.)

        (Dx_code_3digit1 Dx_code_3digit2 Dx_code_3digit3) (:$3.)

      ;

      array digit5 {*} Dx_code_5digit1 Dx_code_5digit2 Dx_code_5digit3;

      %codegen;

      datalines;

    0030,,,003

    0032,,,003

    0032,0030,,003,003

    0030,0085,0091,003,008,009

    0030,0085,0090,003,008,009

    0031,0080,0085,003,008,008

    ;

    run;

    Frequent Contributor
    Posts: 82

    Re: Searching multiple lists of global macro variables

    Patrick, thanks for this solution - this is very helpful.  I am able to use this code to correctly set the flag for if the search string consists of candidate codes with four digits, but for some reason, the flag does not set correctly if the search string contains any candidate code with five digits.  So for example, if I change your code to the following (see bold), the global macro _search_string1 resolves to "0039","00301","0031".

    %LET CANDIDATECODE_5DIGIT1 = 00301;

    %LET CANDIDATECODE_5DIGIT2 = 0031;

    %LET CANDIDATECODE_5DIGIT3 = 0039;

    %LET CANDIDATECODE_5DIGIT4 = 0084;

    %LET CANDIDATECODE_5DIGIT5 = 0085;

    %LET CANDIDATECODE_5DIGIT6 = 0090;

    ....

    data Have;

    ...

     

    datalines;

    00301,,,003

    0032,,,003

    0032,0030,,003,003

    0030,0085,0091,003,008,009

    0030,0085,0090,003,008,009

    0031,0080,0085,003,008,008

    ;

    run;

    flag1 sets to 0 for the first observation and sets to 1 for observations 3-6.  However, flag1 should equal 1 for observation 1 (since 00301 is present) and observation 6 (since 0031 is present) but should equal zero for observations 3-5 (0030 is present and 0030 is not part of the search string).  flag2 sets correctly to 1 (because the search string consists of two four-digit codes "0084", "0085") , as does flag 3 (because the search string consists of a four-digit code "0090.").  It's a bit of a weird pattern - SAS appears to be ignoring the fact that 00301 (part of the search code) is present in observation 1 and treating 0030 (not part of the search code) in observations 3-5 as being equal to 00301.

    Any ideas on why this might be the case?  Thanks.

    __________________________________________________________________________________________________________________________

    To address your other comment, I do have code that correctly set the flags but it is terribly inefficient.  This code searches each of the 3 five-digit code variables for any of the 900 five-digit candidate codes and takes about 2 hours to run (compared to 1.5 minutes for your code).

    %let cc3d = candidatecode_3digit;

    %let cc5d = candidatecode_5digit;

    %macro flag;

    %do j = 1 %to 321; /*Counter for the 321 3-digit codes*/

      data newflag&j; set maindatafile;

      array dx_5digit {*} $ dx_code_5digit1-dx_code_5digit3;

      array dx_3digit {*} $ dx_code_3digit1-dx_code_3digit3;

      flag&j = 0;

      %do k = 1 %to &nobs_codes_5digit; /*Counter for 900 5-digit codes*/

               %do i = 1 %to 3;

                     if dx_3digit(&i) = "&&&cc3d&j." then do;

                          if dx_5digit(&i) = "&&&cc5d&k." then flag&j = 1;

                     end;

                %end;

      %end;

      run;

    %end;

    %mend flag;

    %flag;

    Solution
    ‎06-26-2014 02:45 AM
    Respected Advisor
    Posts: 3,894

    Re: Searching multiple lists of global macro variables

    Add below line in bold to the code I've previously posted - this will fix the issue.

    The reason why it didn't work with "00301" is that variable _search_string gets defined with the very first string length passed to it - which happens to be a 4 character string using your data sample. So 00301 got truncated to 0030 and therefore you didn't get a match.

    %macro codegen;

      length _search_string $5;

      %let n=%sysfunc(coalesce(&max_n,0));

      %do i=1 %to &n;

        flag&i=0;

        do _search_string=&&_search_string&i while(flag&i=0);

          if whichc(_search_string, of digit5

  • ) then flag&i=1;
  •     end;

      %end;

      drop _search_string;

    %mend;

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 16 replies
    • 515 views
    • 3 likes
    • 5 in conversation