BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bkq32
Quartz | Level 8

I received participant data from 3 independent research sites. If an individual was seen at multiple sites, they will have received a different ID from each site. In order to identify which participants are the same individuals across sites, I ran their demographic info through a probabilistic matching software that compares two records at a time. The software outputs a file like this:

data have;
 input subjectid_1 subjectid_2;
 cards;
1 5
2 7
3 6
4 5
7 8
;
run; 
*For example, subjects 1 and 5 are the same person.;
*Notes:
1) The sites do not use each other's IDs (e.g. SUBJECTID_1=7 is definitely the same person as SUBJECTID_2=7)
2) Assume the matching software results are correct.
;

I now want any given individual to only be counted in one row, with all their IDs listed in the columns:

data want;
 input subjectid_1 subjectid_2 subjectid_3;
 cards;
1 5 4
2 7 8
3 6
;
run; 

Is there a way to do this? After it's organized this way, I can assign them my own standardized ID and know how many unique individuals there are.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.

data have;
  input from to;
  cards;
1 5
2 7
3 6
4 5
7 8
;

data full;
  set have end=last;

  if _n_ eq 1 then
    do;
      declare hash h();
      h.definekey('node');
      h.definedata('node');
      h.definedone();
    end;

  output;
  node=from;
  h.replace();
  from=to;
  to=node;
  output;
  node=from;
  h.replace();

  if last then
    h.output(dataset:'node');
  drop node;
run;

data want(keep=node group_id);
  declare hash ha(ordered:'a');
  declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('last');
  ha.definedone();
  declare hash _ha(hashexp: 20);
  _ha.definekey('key');
  _ha.definedone();

  if 0 then set full;
  declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
  from_to.definekey('from');
  from_to.definedata('to');
  from_to.definedone();

  if 0 then set node;
  declare hash no(dataset:'node');
  declare hiter hi_no('no');
  no.definekey('node');
  no.definedata('node');
  no.definedone();

  do while(hi_no.next()=0);
    group_id+1;
    output;
    count=1;
    key=node;
    _ha.add();
    last=node;
    ha.add();
    rc=hi.first();

    do while(rc=0);
      from=last;
      rx=from_to.find();

      do while(rx=0);
        key=to;
        ry=_ha.check();

        if ry ne 0 then
          do;
            node=to;
            output;
            rr=no.remove(key:node);
            key=to;
            _ha.add();
            count+1;
            last=to;
            ha.add();
          end;

        rx=from_to.find_next();
      end;

      rc=hi.next();
    end;

    ha.clear();
    _ha.clear();
  end;

  stop;
run;

proc print data=want;
run;

Patrick_0-1706667091315.png

 

 

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.

data have;
  input from to;
  cards;
1 5
2 7
3 6
4 5
7 8
;

data full;
  set have end=last;

  if _n_ eq 1 then
    do;
      declare hash h();
      h.definekey('node');
      h.definedata('node');
      h.definedone();
    end;

  output;
  node=from;
  h.replace();
  from=to;
  to=node;
  output;
  node=from;
  h.replace();

  if last then
    h.output(dataset:'node');
  drop node;
run;

data want(keep=node group_id);
  declare hash ha(ordered:'a');
  declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('last');
  ha.definedone();
  declare hash _ha(hashexp: 20);
  _ha.definekey('key');
  _ha.definedone();

  if 0 then set full;
  declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
  from_to.definekey('from');
  from_to.definedata('to');
  from_to.definedone();

  if 0 then set node;
  declare hash no(dataset:'node');
  declare hiter hi_no('no');
  no.definekey('node');
  no.definedata('node');
  no.definedone();

  do while(hi_no.next()=0);
    group_id+1;
    output;
    count=1;
    key=node;
    _ha.add();
    last=node;
    ha.add();
    rc=hi.first();

    do while(rc=0);
      from=last;
      rx=from_to.find();

      do while(rx=0);
        key=to;
        ry=_ha.check();

        if ry ne 0 then
          do;
            node=to;
            output;
            rr=no.remove(key:node);
            key=to;
            _ha.add();
            count+1;
            last=to;
            ha.add();
          end;

        rx=from_to.find_next();
      end;

      rc=hi.next();
    end;

    ha.clear();
    _ha.clear();
  end;

  stop;
run;

proc print data=want;
run;

Patrick_0-1706667091315.png

 

 

bkq32
Quartz | Level 8

This is awesome - thank you, @Patrick and @Ksharp!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 994 views
  • 3 likes
  • 2 in conversation