I have a list of 53 variables. And I need to flag if any of them equal each other when they aren't missing. Does anyone know an efficient way to do this? I could do this 1 at time, but this would mean potentially 1431 statements of
if Pathogen_1_961712 not = "" and Pathogen_1_961712 = Pathogen_2_961712 then poscult = 1;
the variable names are listed
Pathogen_1_242396_10 |
Pathogen_1_242396_11 |
Pathogen_1_242396_2 |
Pathogen_1_242396_3 |
Pathogen_1_242396_4 |
Pathogen_1_242396_5 |
Pathogen_1_242396_6 |
Pathogen_1_242396_7 |
Pathogen_1_242396_8 |
Pathogen_1_242396_9 |
Pathogen_2_242396 |
Pathogen_2_242396_10 |
Pathogen_2_242396_11 |
Pathogen_2_242396_2 |
Pathogen_2_242396_3 |
Pathogen_2_242396_4 |
Pathogen_2_242396_5 |
Pathogen_2_242396_6 |
Pathogen_2_242396_7 |
Pathogen_2_242396_8 |
Pathogen_2_242396_9 |
APathogen_1_242396 |
APathogen_1_242396_10 |
APathogen_1_242396_11 |
APathogen_1_242396_2 |
APathogen_1_242396_3 |
APathogen_1_242396_4 |
APathogen_1_242396_5 |
APathogen_1_242396_6 |
APathogen_1_242396_7 |
APathogen_1_242396_8 |
APathogen_1_242396_9 |
BPathogen_1_242396 |
BPathogen_1_242396_10 |
BPathogen_1_242396_11 |
BPathogen_1_242396_2 |
BPathogen_1_242396_3 |
BPathogen_1_242396_4 |
BPathogen_1_242396_5 |
BPathogen_1_242396_6 |
BPathogen_1_242396_7 |
BPathogen_1_242396_8 |
BPathogen_1_242396_9 |
CPathogen_1_242396 |
CPathogen_1_242396_10 |
CPathogen_1_242396_11 |
CPathogen_1_242396_2 |
CPathogen_1_242396_3 |
CPathogen_1_242396_4 |
CPathogen_1_242396_5 |
CPathogen_1_242396_6 |
CPathogen_1_242396_7 |
CPathogen_1_242396_8 |
CPathogen_1_242396_9 |
Something like the following may get you started:
data have;
input (t1-t4) (:$1.);
cards;
1 2 3 4
1 2 3 2
run;
data want;
set have;
dup_flag=0;
array t(4) t1-t4;
array temp(4) $ 1 _temporary_;
do i=1 to 4;
temp(i)=t(i);
end;
do i=1 to 4;
call missing (temp(i));
if t(i) in temp then do;
dup_flag=1;
leave;
end;
end;
drop i;
run;
Update: To end the loop once getting a hit.
If you have a big table and troubled by the efficiency, then following code can be considered 'being on steroid':
data have;
input (t1-t4) (:$2.);
cards;
1 2 3 4
1 2 3 2
run;
data want_fast;
set have;
length _cat $ 100;
dup_flag=0;
array t(4) t1-t4;
_cat=peekclong(addrlong(t(1)), 8);
do i=1 to 4;
if count(_cat, t(i), 't')>1 then
do;
dup_flag=1;
leave;
end;
end;
drop _cat i;
run;
This is not so difficult if you use arrays. But would it be useful? All you get is a flag that is 0 or 1. When it is 1 you still don't know where the match occurred. At any rate, you may have to type all 53 names once:
data want;
set have;
array paths {53} list of pathogen names goes here;
postcult=0;
do i=1 to 52 until (postcult=1);
if paths{i} ne ' ' then do j=i+1 to 53;
if paths{i} = paths{j} then postcult=1;
end;
end;
drop i j;
run;
Good luck.
Hash Table.
data have; input (t1-t4) (:$1.); cards; 1 2 3 4 1 2 3 2 . 3 . 6 . 4 4 5 ; run; data want; if _n_ eq 1 then do; length k $ 40; declare hash h(); h.definekey('k'); h.definedone(); end; set have; array x{*} $ t1-t4; do i=1 to dim(x); if not missing(x{i}) then do;k=x{i};rc=h.add();end; end; if dim(x)-cmiss(of x{*})=h.num_items then poscult=0; else poscult=1; h.clear(); drop i k rc; run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.