BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bkq32
Quartz | Level 8

I received participant data from 3 independent research sites. If an individual was seen at multiple sites, they will have received a different ID from each site. In order to identify which participants are the same individuals across sites, I ran their demographic info through a probabilistic matching software that compares two records at a time. The software outputs a file like this:

data have;
 input subjectid_1 subjectid_2;
 cards;
1 5
2 7
3 6
4 5
7 8
;
run; 
*For example, subjects 1 and 5 are the same person.;
*Notes:
1) The sites do not use each other's IDs (e.g. SUBJECTID_1=7 is definitely the same person as SUBJECTID_2=7)
2) Assume the matching software results are correct.
;

I now want any given individual to only be counted in one row, with all their IDs listed in the columns:

data want;
 input subjectid_1 subjectid_2 subjectid_3;
 cards;
1 5 4
2 7 8
3 6
;
run; 

Is there a way to do this? After it's organized this way, I can assign them my own standardized ID and know how many unique individuals there are.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.

data have;
  input from to;
  cards;
1 5
2 7
3 6
4 5
7 8
;

data full;
  set have end=last;

  if _n_ eq 1 then
    do;
      declare hash h();
      h.definekey('node');
      h.definedata('node');
      h.definedone();
    end;

  output;
  node=from;
  h.replace();
  from=to;
  to=node;
  output;
  node=from;
  h.replace();

  if last then
    h.output(dataset:'node');
  drop node;
run;

data want(keep=node group_id);
  declare hash ha(ordered:'a');
  declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('last');
  ha.definedone();
  declare hash _ha(hashexp: 20);
  _ha.definekey('key');
  _ha.definedone();

  if 0 then set full;
  declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
  from_to.definekey('from');
  from_to.definedata('to');
  from_to.definedone();

  if 0 then set node;
  declare hash no(dataset:'node');
  declare hiter hi_no('no');
  no.definekey('node');
  no.definedata('node');
  no.definedone();

  do while(hi_no.next()=0);
    group_id+1;
    output;
    count=1;
    key=node;
    _ha.add();
    last=node;
    ha.add();
    rc=hi.first();

    do while(rc=0);
      from=last;
      rx=from_to.find();

      do while(rx=0);
        key=to;
        ry=_ha.check();

        if ry ne 0 then
          do;
            node=to;
            output;
            rr=no.remove(key:node);
            key=to;
            _ha.add();
            count+1;
            last=to;
            ha.add();
          end;

        rx=from_to.find_next();
      end;

      rc=hi.next();
    end;

    ha.clear();
    _ha.clear();
  end;

  stop;
run;

proc print data=want;
run;

Patrick_0-1706667091315.png

 

 

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.

data have;
  input from to;
  cards;
1 5
2 7
3 6
4 5
7 8
;

data full;
  set have end=last;

  if _n_ eq 1 then
    do;
      declare hash h();
      h.definekey('node');
      h.definedata('node');
      h.definedone();
    end;

  output;
  node=from;
  h.replace();
  from=to;
  to=node;
  output;
  node=from;
  h.replace();

  if last then
    h.output(dataset:'node');
  drop node;
run;

data want(keep=node group_id);
  declare hash ha(ordered:'a');
  declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('last');
  ha.definedone();
  declare hash _ha(hashexp: 20);
  _ha.definekey('key');
  _ha.definedone();

  if 0 then set full;
  declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
  from_to.definekey('from');
  from_to.definedata('to');
  from_to.definedone();

  if 0 then set node;
  declare hash no(dataset:'node');
  declare hiter hi_no('no');
  no.definekey('node');
  no.definedata('node');
  no.definedone();

  do while(hi_no.next()=0);
    group_id+1;
    output;
    count=1;
    key=node;
    _ha.add();
    last=node;
    ha.add();
    rc=hi.first();

    do while(rc=0);
      from=last;
      rx=from_to.find();

      do while(rx=0);
        key=to;
        ry=_ha.check();

        if ry ne 0 then
          do;
            node=to;
            output;
            rr=no.remove(key:node);
            key=to;
            _ha.add();
            count+1;
            last=to;
            ha.add();
          end;

        rx=from_to.find_next();
      end;

      rc=hi.next();
    end;

    ha.clear();
    _ha.clear();
  end;

  stop;
run;

proc print data=want;
run;

Patrick_0-1706667091315.png

 

 

bkq32
Quartz | Level 8

This is awesome - thank you, @Patrick and @Ksharp!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 525 views
  • 3 likes
  • 2 in conversation