I received participant data from 3 independent research sites. If an individual was seen at multiple sites, they will have received a different ID from each site. In order to identify which participants are the same individuals across sites, I ran their demographic info through a probabilistic matching software that compares two records at a time. The software outputs a file like this:
data have;
input subjectid_1 subjectid_2;
cards;
1 5
2 7
3 6
4 5
7 8
;
run;
*For example, subjects 1 and 5 are the same person.;
*Notes:
1) The sites do not use each other's IDs (e.g. SUBJECTID_1=7 is definitely the same person as SUBJECTID_2=7)
2) Assume the matching software results are correct.
;
I now want any given individual to only be counted in one row, with all their IDs listed in the columns:
data want;
input subjectid_1 subjectid_2 subjectid_3;
cards;
1 5 4
2 7 8
3 6
;
run;
Is there a way to do this? After it's organized this way, I can assign them my own standardized ID and know how many unique individuals there are.
After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.
data have;
input from to;
cards;
1 5
2 7
3 6
4 5
7 8
;
data full;
set have end=last;
if _n_ eq 1 then
do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from;
h.replace();
from=to;
to=node;
output;
node=from;
h.replace();
if last then
h.output(dataset:'node');
drop node;
run;
data want(keep=node group_id);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
group_id+1;
output;
count=1;
key=node;
_ha.add();
last=node;
ha.add();
rc=hi.first();
do while(rc=0);
from=last;
rx=from_to.find();
do while(rx=0);
key=to;
ry=_ha.check();
if ry ne 0 then
do;
node=to;
output;
rr=no.remove(key:node);
key=to;
_ha.add();
count+1;
last=to;
ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();
_ha.clear();
end;
stop;
run;
proc print data=want;
run;
After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.
data have;
input from to;
cards;
1 5
2 7
3 6
4 5
7 8
;
data full;
set have end=last;
if _n_ eq 1 then
do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from;
h.replace();
from=to;
to=node;
output;
node=from;
h.replace();
if last then
h.output(dataset:'node');
drop node;
run;
data want(keep=node group_id);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
group_id+1;
output;
count=1;
key=node;
_ha.add();
last=node;
ha.add();
rc=hi.first();
do while(rc=0);
from=last;
rx=from_to.find();
do while(rx=0);
key=to;
ry=_ha.check();
if ry ne 0 then
do;
node=to;
output;
rr=no.remove(key:node);
key=to;
_ha.add();
count+1;
last=to;
ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();
_ha.clear();
end;
stop;
run;
proc print data=want;
run;
This is awesome - thank you, @Patrick and @Ksharp!
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.