data reference;
infile cards dlm='|';
Input NUM $ COMB1 $ COMB2 : $200.;
datalines;
1| value7 | value999, value10, value2309, value17
2| value200 | value24, value3, value883, value223
3| value12 | value1, value234, value914, value981
4| value8 | value782, value999, value888
5| value116 |value26, value1
6| value900 | value2, value9
;
run;
data dataset;
input ID A $ B $ C $ D $ E $ F $;
datalines;
1 value27 value31 value101 value999 value1 value7
2 value999 value1 value101 value8 value1 value886
3 value200 value31 value101 value49 value1 value3
4 value78 value7 value531 value2309 value112 value1
5 value200 value3 value2141 value919 value981 value234
;
run;
proc sql noprint;
select count(*) into :num_flags from reference;
quit;
%put &num_flags.;
data flagged;
*create temporary array to hold the reference data set;
array M(&num_flags, 3) $ _temporary_;
if _n_=1 then
do j=1 to &num_flags;
set Reference;
M(j, 1)=NUM;
M(j, 2)=COMB1;
M(j, 3)=COMB2;
end;
set dataset;
*loop over reference dataset rows;
do i=1 to &num_flags;
*determine number of words in COMB2;
nwords=countw(M(i, 3));
*if COMB1 is list of A-F then search for values in COMB2;
if whichc(M(i, 2), of A--F) then
do k=1 to nwords;
*isolate each word;
word=scan(M(i, 3), k, ",");
*if found Comb2 item, then assign flag and output that line;
if whichc(word, of A--F)>0 then
do;
flag=M(i, 1);
output;
end;
end;
end;
drop NUM COMB1 COMB2 i j k nwords word;
run; Unfortunately I’m finding the same issue I was having with my original array, where combinations are only being identified if the COMB2 value occurs first in the list in the reference file. In the above example, the following combinations (below in bold) are present in the dataset: data dataset; input ID A $ B $ C $ D $ E $ F $; datalines; 1 value27 value31 value101 value999 value1 value7 /* VALUE 7 & 999 (#1 in ref file) */ 2 value999 value1 value101 value8 value1 value886 /* VALUE 8 & 999 (#4 in ref file) */ 3 value200 value31 value101 value49 value1 value3 /* VALUE 200 & 3 (#2 in ref file) */ 4 value78 value7 value531 value2309 value112 value1 /* VALUE 7 & 2309 (#1 in ref file) */ 5 value200 value3 value2141 value919 value981 value234 /* VALUE 200 & 3 (#2 in ref file) */ ; run; However, when I run the code, only ID #1 from the reference file is flagged (as value999 appears as the first value in the COMB2 list. For the other combinations, the COMB2 value appears somewhere in the list after position 1, e.g. the 2nd row in the dataset has combination VALUE 999 & 8, this is ID 4 in the reference file where 999 appears as the 2nd value in COMB2 col). How can I make the array read through all values in the COMB2 list for a match?
... View more