I have a list of 53 variables. And I need to flag if any of them equal each other when they aren't missing. Does anyone know an efficient way to do this? I could do this 1 at time, but this would mean potentially 1431 statements of
if Pathogen_1_961712 not = "" and Pathogen_1_961712 = Pathogen_2_961712 then poscult = 1;
the variable names are listed
Pathogen_1_242396_10 |
Pathogen_1_242396_11 |
Pathogen_1_242396_2 |
Pathogen_1_242396_3 |
Pathogen_1_242396_4 |
Pathogen_1_242396_5 |
Pathogen_1_242396_6 |
Pathogen_1_242396_7 |
Pathogen_1_242396_8 |
Pathogen_1_242396_9 |
Pathogen_2_242396 |
Pathogen_2_242396_10 |
Pathogen_2_242396_11 |
Pathogen_2_242396_2 |
Pathogen_2_242396_3 |
Pathogen_2_242396_4 |
Pathogen_2_242396_5 |
Pathogen_2_242396_6 |
Pathogen_2_242396_7 |
Pathogen_2_242396_8 |
Pathogen_2_242396_9 |
APathogen_1_242396 |
APathogen_1_242396_10 |
APathogen_1_242396_11 |
APathogen_1_242396_2 |
APathogen_1_242396_3 |
APathogen_1_242396_4 |
APathogen_1_242396_5 |
APathogen_1_242396_6 |
APathogen_1_242396_7 |
APathogen_1_242396_8 |
APathogen_1_242396_9 |
BPathogen_1_242396 |
BPathogen_1_242396_10 |
BPathogen_1_242396_11 |
BPathogen_1_242396_2 |
BPathogen_1_242396_3 |
BPathogen_1_242396_4 |
BPathogen_1_242396_5 |
BPathogen_1_242396_6 |
BPathogen_1_242396_7 |
BPathogen_1_242396_8 |
BPathogen_1_242396_9 |
CPathogen_1_242396 |
CPathogen_1_242396_10 |
CPathogen_1_242396_11 |
CPathogen_1_242396_2 |
CPathogen_1_242396_3 |
CPathogen_1_242396_4 |
CPathogen_1_242396_5 |
CPathogen_1_242396_6 |
CPathogen_1_242396_7 |
CPathogen_1_242396_8 |
CPathogen_1_242396_9 |
Something like the following may get you started:
data have;
input (t1-t4) (:$1.);
cards;
1 2 3 4
1 2 3 2
run;
data want;
set have;
dup_flag=0;
array t(4) t1-t4;
array temp(4) $ 1 _temporary_;
do i=1 to 4;
temp(i)=t(i);
end;
do i=1 to 4;
call missing (temp(i));
if t(i) in temp then do;
dup_flag=1;
leave;
end;
end;
drop i;
run;
Update: To end the loop once getting a hit.
If you have a big table and troubled by the efficiency, then following code can be considered 'being on steroid':
data have;
input (t1-t4) (:$2.);
cards;
1 2 3 4
1 2 3 2
run;
data want_fast;
set have;
length _cat $ 100;
dup_flag=0;
array t(4) t1-t4;
_cat=peekclong(addrlong(t(1)), 8);
do i=1 to 4;
if count(_cat, t(i), 't')>1 then
do;
dup_flag=1;
leave;
end;
end;
drop _cat i;
run;
This is not so difficult if you use arrays. But would it be useful? All you get is a flag that is 0 or 1. When it is 1 you still don't know where the match occurred. At any rate, you may have to type all 53 names once:
data want;
set have;
array paths {53} list of pathogen names goes here;
postcult=0;
do i=1 to 52 until (postcult=1);
if paths{i} ne ' ' then do j=i+1 to 53;
if paths{i} = paths{j} then postcult=1;
end;
end;
drop i j;
run;
Good luck.
Hash Table.
data have; input (t1-t4) (:$1.); cards; 1 2 3 4 1 2 3 2 . 3 . 6 . 4 4 5 ; run; data want; if _n_ eq 1 then do; length k $ 40; declare hash h(); h.definekey('k'); h.definedone(); end; set have; array x{*} $ t1-t4; do i=1 to dim(x); if not missing(x{i}) then do;k=x{i};rc=h.add();end; end; if dim(x)-cmiss(of x{*})=h.num_items then poscult=0; else poscult=1; h.clear(); drop i k rc; run;
Xia Keshan
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.