I have a list of 53 variables. And I need to flag if any of them equal each other when they aren't missing. Does anyone know an efficient way to do this? I could do this 1 at time, but this would mean potentially 1431 statements of
if Pathogen_1_961712 not = "" and Pathogen_1_961712 = Pathogen_2_961712 then poscult = 1;
the variable names are listed
| Pathogen_1_242396_10 |
| Pathogen_1_242396_11 |
| Pathogen_1_242396_2 |
| Pathogen_1_242396_3 |
| Pathogen_1_242396_4 |
| Pathogen_1_242396_5 |
| Pathogen_1_242396_6 |
| Pathogen_1_242396_7 |
| Pathogen_1_242396_8 |
| Pathogen_1_242396_9 |
| Pathogen_2_242396 |
| Pathogen_2_242396_10 |
| Pathogen_2_242396_11 |
| Pathogen_2_242396_2 |
| Pathogen_2_242396_3 |
| Pathogen_2_242396_4 |
| Pathogen_2_242396_5 |
| Pathogen_2_242396_6 |
| Pathogen_2_242396_7 |
| Pathogen_2_242396_8 |
| Pathogen_2_242396_9 |
| APathogen_1_242396 |
| APathogen_1_242396_10 |
| APathogen_1_242396_11 |
| APathogen_1_242396_2 |
| APathogen_1_242396_3 |
| APathogen_1_242396_4 |
| APathogen_1_242396_5 |
| APathogen_1_242396_6 |
| APathogen_1_242396_7 |
| APathogen_1_242396_8 |
| APathogen_1_242396_9 |
| BPathogen_1_242396 |
| BPathogen_1_242396_10 |
| BPathogen_1_242396_11 |
| BPathogen_1_242396_2 |
| BPathogen_1_242396_3 |
| BPathogen_1_242396_4 |
| BPathogen_1_242396_5 |
| BPathogen_1_242396_6 |
| BPathogen_1_242396_7 |
| BPathogen_1_242396_8 |
| BPathogen_1_242396_9 |
| CPathogen_1_242396 |
| CPathogen_1_242396_10 |
| CPathogen_1_242396_11 |
| CPathogen_1_242396_2 |
| CPathogen_1_242396_3 |
| CPathogen_1_242396_4 |
| CPathogen_1_242396_5 |
| CPathogen_1_242396_6 |
| CPathogen_1_242396_7 |
| CPathogen_1_242396_8 |
| CPathogen_1_242396_9 |
Something like the following may get you started:
data have;
input (t1-t4) (:$1.);
cards;
1 2 3 4
1 2 3 2
run;
data want;
set have;
dup_flag=0;
array t(4) t1-t4;
array temp(4) $ 1 _temporary_;
do i=1 to 4;
temp(i)=t(i);
end;
do i=1 to 4;
call missing (temp(i));
if t(i) in temp then do;
dup_flag=1;
leave;
end;
end;
drop i;
run;
Update: To end the loop once getting a hit.
If you have a big table and troubled by the efficiency, then following code can be considered 'being on steroid':
data have;
input (t1-t4) (:$2.);
cards;
1 2 3 4
1 2 3 2
run;
data want_fast;
set have;
length _cat $ 100;
dup_flag=0;
array t(4) t1-t4;
_cat=peekclong(addrlong(t(1)), 8);
do i=1 to 4;
if count(_cat, t(i), 't')>1 then
do;
dup_flag=1;
leave;
end;
end;
drop _cat i;
run;
This is not so difficult if you use arrays. But would it be useful? All you get is a flag that is 0 or 1. When it is 1 you still don't know where the match occurred. At any rate, you may have to type all 53 names once:
data want;
set have;
array paths {53} list of pathogen names goes here;
postcult=0;
do i=1 to 52 until (postcult=1);
if paths{i} ne ' ' then do j=i+1 to 53;
if paths{i} = paths{j} then postcult=1;
end;
end;
drop i j;
run;
Good luck.
Hash Table.
data have;
input (t1-t4) (:$1.);
cards;
1 2 3 4
1 2 3 2
. 3 . 6
. 4 4 5
;
run;
data want;
if _n_ eq 1 then do;
length k $ 40;
declare hash h();
h.definekey('k');
h.definedone();
end;
set have;
array x{*} $ t1-t4;
do i=1 to dim(x);
if not missing(x{i}) then do;k=x{i};rc=h.add();end;
end;
if dim(x)-cmiss(of x{*})=h.num_items then poscult=0;
else poscult=1;
h.clear();
drop i k rc;
run;
Xia Keshan
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.