BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AlexPaezSilva
Fluorite | Level 6

Hi Gents, 

I am trying to flag and count matches across 2 sets of variables (KOL1-KOL4 compared to MT1-MT4). I think this is considered an many-to-many array match, but I am not advanced to know what to look for? 

 

WANT: if ANY value in KOL1 or KOL2 or KOL3 or KOL4 is found in ANY value from variables MT1 or MT2 or MT3 or MT4

then create new variable FLAG=1 and count the number of matches as COUNT=n. 

 

So for example, 

if KOL1=MT2 then Flag=1 and Count=1;

if KOL1=MT3 and KOL2=MT1 then Flag=1 and Count=2;

et cetera. 

 

I tried coding the combinations in an if...else if... logic but there's 41 possible values for each of the variables and there's 8 variables...so 328 combinations is just too much to hard code.

 

Any help is much appreciated. Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming

  1. There are no duplicates in KOL1-KOL4 (but there could be in MT1-MT4), and
  2. Missing values present in both arrays are not eligible to be counted as matches

then:

 

data want (drop=m);
  set have;
  array ko{*} kol1-kol4;
  array mt{*} mt1-mt4;
  do m=1 to dim(mt);
    count=sum(count,whichn(mt{m},of ko{*})>0);
  end;
  flag=(count>0);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
ballardw
Super User

No example data so this is a guess for a place to start.

 

data want;
   set have;
   array k(*) kol1-kol4;
   array m(*) mt1-mt4;
   do i=1 to dim(k);
      do j=1 to dim(m);
         if k[i]=m[j] then do;
            flag=1;
            count=sum(count,1);
         end;
      end;
   end;
   drop i j;
run;

Unless you are trying to do a count per specific value then the actual value of the variables in question isn't really pertinent to the flag and match count.

 

If the above doesn't provide a moderately good start then 1) provide example data in the form of data step code and 2) what you expect for the given data

 


@AlexPaezSilva wrote:

Hi Gents, 

I am trying to flag and count matches across 2 sets of variables (KOL1-KOL4 compared to MT1-MT4). I think this is considered an many-to-many array match, but I am not advanced to know what to look for? 

 

WANT: if ANY value in KOL1 or KOL2 or KOL3 or KOL4 is found in ANY value from variables MT1 or MT2 or MT3 or MT4

then create new variable FLAG=1 and count the number of matches as COUNT=n. 

 

So for example, 

if KOL1=MT2 then Flag=1 and Count=1;

if KOL1=MT3 and KOL2=MT1 then Flag=1 and Count=2;

et cetera. 

 

I tried coding the combinations in an if...else if... logic but there's 41 possible values for each of the variables and there's 8 variables...so 328 combinations is just too much to hard code.

 

Any help is much appreciated. Thanks!

 


 

mkeintz
PROC Star

Within the array KOL1-KOl4, are there duplicates?   

And within the array MT1-MT4, are there duplicates?   

 

This would influence the construction of the loops over the array elements.

 

How about some sample data, in the form of a working DATA step?

 

Oh yes,  And do the KOL or MT variables ever have a missing value?  And if so, can missing values in both arrays constitute a match?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Assuming

  1. There are no duplicates in KOL1-KOL4 (but there could be in MT1-MT4), and
  2. Missing values present in both arrays are not eligible to be counted as matches

then:

 

data want (drop=m);
  set have;
  array ko{*} kol1-kol4;
  array mt{*} mt1-mt4;
  do m=1 to dim(mt);
    count=sum(count,whichn(mt{m},of ko{*})>0);
  end;
  flag=(count>0);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
erfr225
Calcite | Level 5
I came across this thread after having a similar problem. The code you provided works but it seems to be counting the missing as a match. Am I overlooking something?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1632 views
  • 0 likes
  • 4 in conversation