Solved: Flag and count matches across 2 sets of arrays (4 variables each set)

AlexPaezSilva · Posted 08-11-2023 01:57 PM

Hi Gents,

I am trying to flag and count matches across 2 sets of variables (KOL1-KOL4 compared to MT1-MT4). I think this is considered an many-to-many array match, but I am not advanced to know what to look for?

WANT: if ANY value in KOL1 or KOL2 or KOL3 or KOL4 is found in ANY value from variables MT1 or MT2 or MT3 or MT4

then create new variable FLAG=1 and count the number of matches as COUNT=n.

So for example,

if KOL1=MT2 then Flag=1 and Count=1;

if KOL1=MT3 and KOL2=MT1 then Flag=1 and Count=2;

et cetera.

I tried coding the combinations in an if...else if... logic but there's 41 possible values for each of the variables and there's 8 variables...so 328 combinations is just too much to hard code.

Any help is much appreciated. Thanks!

mkeintz · Posted 08-11-2023 11:05 PM

Assuming

There are no duplicates in KOL1-KOL4 (but there could be in MT1-MT4), and
Missing values present in both arrays are not eligible to be counted as matches

then:

data want (drop=m);
  set have;
  array ko{*} kol1-kol4;
  array mt{*} mt1-mt4;
  do m=1 to dim(mt);
    count=sum(count,whichn(mt{m},of ko{*})>0);
  end;
  flag=(count>0);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 08-11-2023 02:05 PM

No example data so this is a guess for a place to start.

data want;
   set have;
   array k(*) kol1-kol4;
   array m(*) mt1-mt4;
   do i=1 to dim(k);
      do j=1 to dim(m);
         if k[i]=m[j] then do;
            flag=1;
            count=sum(count,1);
         end;
      end;
   end;
   drop i j;
run;

Unless you are trying to do a count per specific value then the actual value of the variables in question isn't really pertinent to the flag and match count.

If the above doesn't provide a moderately good start then 1) provide example data in the form of data step code and 2) what you expect for the given data

@AlexPaezSilva wrote:

Hi Gents,

I am trying to flag and count matches across 2 sets of variables (KOL1-KOL4 compared to MT1-MT4). I think this is considered an many-to-many array match, but I am not advanced to know what to look for?

WANT: if ANY value in KOL1 or KOL2 or KOL3 or KOL4 is found in ANY value from variables MT1 or MT2 or MT3 or MT4

then create new variable FLAG=1 and count the number of matches as COUNT=n.

So for example,

if KOL1=MT2 then Flag=1 and Count=1;

if KOL1=MT3 and KOL2=MT1 then Flag=1 and Count=2;

et cetera.

I tried coding the combinations in an if...else if... logic but there's 41 possible values for each of the variables and there's 8 variables...so 328 combinations is just too much to hard code.

Any help is much appreciated. Thanks!

mkeintz · Posted 08-11-2023 06:14 PM

Within the array KOL1-KOl4, are there duplicates?

And within the array MT1-MT4, are there duplicates?

This would influence the construction of the loops over the array elements.

How about some sample data, in the form of a working DATA step?

Oh yes, And do the KOL or MT variables ever have a missing value? And if so, can missing values in both arrays constitute a match?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 08-11-2023 11:05 PM

Assuming

There are no duplicates in KOL1-KOL4 (but there could be in MT1-MT4), and
Missing values present in both arrays are not eligible to be counted as matches

then:

data want (drop=m);
  set have;
  array ko{*} kol1-kol4;
  array mt{*} mt1-mt4;
  do m=1 to dim(mt);
    count=sum(count,whichn(mt{m},of ko{*})>0);
  end;
  flag=(count>0);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

erfr225 · Posted 09-12-2023 12:32 PM

I came across this thread after having a similar problem. The code you provided works but it seems to be counting the missing as a match. Am I overlooking something?

Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

Re: Flag and count matches across 2 sets of arrays (4 variables each set)

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!