topic How do I create a wide file of IDs? in SAS Programming

How do I create a wide file of IDs?

bkq32 — Tue, 30 Jan 2024 23:38:49 GMT

I received participant data from 3 independent research sites. If an individual was seen at multiple sites, they will have received a different ID from each site. In order to identify which participants are the same individuals across sites, I ran their demographic info through a probabilistic matching software that compares two records at a time. The software outputs a file like this:

data have;
 input subjectid_1 subjectid_2;
 cards;
1 5
2 7
3 6
4 5
7 8
;
run; 
*For example, subjects 1 and 5 are the same person.;
*Notes: 
 1) The sites do not use each other's IDs (e.g. SUBJECTID_1=7 is definitely the same person as SUBJECTID_2=7)
 2) Assume the matching software results are correct.
;

I now want any given individual to only be counted in one row, with all their IDs listed in the columns:

data want;
 input subjectid_1 subjectid_2 subjectid_3;
 cards;
1 5 4
2 7 8
3 6
;
run;

Is there a way to do this? After it's organized this way, I can assign them my own standardized ID and know how many unique individuals there are.

Re: How do I create a wide file of IDs?

Patrick — Wed, 31 Jan 2024 02:11:38 GMT

After searching the forum a bit I found code @Ksharp proposed for a similar problem here that should return what you're after.

data have;
  input from to;
  cards;
1 5
2 7
3 6
4 5
7 8
;

data full;
  set have end=last;

  if _n_ eq 1 then
    do;
      declare hash h();
      h.definekey('node');
      h.definedata('node');
      h.definedone();
    end;

  output;
  node=from;
  h.replace();
  from=to;
  to=node;
  output;
  node=from;
  h.replace();

  if last then
    h.output(dataset:'node');
  drop node;
run;

data want(keep=node group_id);
  declare hash ha(ordered:'a');
  declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('last');
  ha.definedone();
  declare hash _ha(hashexp: 20);
  _ha.definekey('key');
  _ha.definedone();

  if 0 then set full;
  declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
  from_to.definekey('from');
  from_to.definedata('to');
  from_to.definedone();

  if 0 then set node;
  declare hash no(dataset:'node');
  declare hiter hi_no('no');
  no.definekey('node');
  no.definedata('node');
  no.definedone();

  do while(hi_no.next()=0);
    group_id+1;
    output;
    count=1;
    key=node;
    _ha.add();
    last=node;
    ha.add();
    rc=hi.first();

    do while(rc=0);
      from=last;
      rx=from_to.find();

      do while(rx=0);
        key=to;
        ry=_ha.check();

        if ry ne 0 then
          do;
            node=to;
            output;
            rr=no.remove(key:node);
            key=to;
            _ha.add();
            count+1;
            last=to;
            ha.add();
          end;

        rx=from_to.find_next();
      end;

      rc=hi.next();
    end;

    ha.clear();
    _ha.clear();
  end;

  stop;
run;

proc print data=want;
run;

Re: How do I create a wide file of IDs?

bkq32 — Wed, 31 Jan 2024 21:09:12 GMT

This is awesome - thank you, @Patrick and @Ksharp!