DATA Step, Macro, Functions and more

Outputing a dataset of unique pairs of observations by groups

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Outputing a dataset of unique pairs of observations by groups

 

I want to create a data set of unique pairs of IDs by groups. So I am selecting 2 IDs from up to 9. In this example it only shows up to ID5. They are siblings  (ID1-ID9) within families (ahhrhid).        

I have tried the macro COMBO and some SQL commands but I can't crack it.

Any help would be much appreciated. 

 

I start with this (ignore _NAME_ & _LABEL_) 

 

Obs

ahhrhid

_NAME_

_LABEL_

ID_1

ID_2

ID_3

ID4

ID5

1

000003

xwaveid

XW Cross wave ID

0100005

0100006

0100007

 

 

2

000006

xwaveid

XW Cross wave ID

0100012

0100013

 

 

 

3

000012

xwaveid

XW Cross wave ID

0100020

0100021

 

 

 

4

000015

xwaveid

XW Cross wave ID

0100025

0100026

0100027

 

 

 

        

 

And what I wish to get is the following:”

 

Ahhrhid  _Label_             Sib_ID1                     Sib_ID2           Age1      Age2

 

000003  Pair 1 & 2            0100005                0100006                 23        20

000003  Pair 1 & 3            0100005                0100007                 23        18

000003  Pair 2 & 3            0100020                0100021                 20        18

000006  Pair 1 & 2            0100012                0100013                 12       14

000012  Pair 1 & 2            0100020                0100021                 18        16

000015  Pair 1 & 2            0100025                0100026                 25        23

000015  Pair 1 & 3            0100025                0100027                 25        24

000015  Pair 2 & 3            0100026                0100027                 23        24  

 

I would like to include other variables beside ID like Age if possible.


Accepted Solutions
Solution
‎07-19-2017 01:29 AM
Respected Advisor
Posts: 4,173

Re: Outputing a dataset of unique pairs of observations by groups

Posted in reply to garymarks

@garymarks

I totally agree with @KurtBremser: You want something from us so please try and make our life easy.

 

You haven't provided the Age variable so here how to get to the rows with the ID pairs:

data have;
  infile datalines truncover dlm=',' dsd;
input ahhrhid NAME:$10. LABEL:$30. ID_1 - ID_5;
datalines;
3,xwaveid,XW Cross wave ID,100005,100006,100007,,
6,xwaveid,XW Cross wave ID,100012,100013,,,
12,xwaveid,XW Cross wave ID,100020,100021,,,
15,xwaveid,XW Cross wave ID,100025,100026,100027,100028,
;
run;

data want(keep=ahhrhid NAME LABEL sib_id_:);
  set have;
  array ids {*} id_:;
  do _i=1 to dim(ids)-1;
    if missing(ids[_i]) then 
      do;
        _i=dim(ids);
        leave;
      end;
    do _j=_i+1 to dim(ids);
      if missing(ids[_j]) then leave;
      sib_id_1=ids[_i];
      sib_id_2=ids[_j];
      output;
    end;
  end;
run;

View solution in original post


All Replies
Super User
Posts: 7,771

Re: Outputing a dataset of unique pairs of observations by groups

Posted in reply to garymarks

Please post your example data in a data step similar to this:

data have;
input ahhrhid $ id_1 $ id_2 $ id3 $;
cards;
000003 0100005 0100006 0100007
;
run;

This shows us the exact structure of the data we should be working with, and we only need to do a copy/paste and run to recreate your dataset.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎07-19-2017 01:29 AM
Respected Advisor
Posts: 4,173

Re: Outputing a dataset of unique pairs of observations by groups

Posted in reply to garymarks

@garymarks

I totally agree with @KurtBremser: You want something from us so please try and make our life easy.

 

You haven't provided the Age variable so here how to get to the rows with the ID pairs:

data have;
  infile datalines truncover dlm=',' dsd;
input ahhrhid NAME:$10. LABEL:$30. ID_1 - ID_5;
datalines;
3,xwaveid,XW Cross wave ID,100005,100006,100007,,
6,xwaveid,XW Cross wave ID,100012,100013,,,
12,xwaveid,XW Cross wave ID,100020,100021,,,
15,xwaveid,XW Cross wave ID,100025,100026,100027,100028,
;
run;

data want(keep=ahhrhid NAME LABEL sib_id_:);
  set have;
  array ids {*} id_:;
  do _i=1 to dim(ids)-1;
    if missing(ids[_i]) then 
      do;
        _i=dim(ids);
        leave;
      end;
    do _j=_i+1 to dim(ids);
      if missing(ids[_j]) then leave;
      sib_id_1=ids[_i];
      sib_id_2=ids[_j];
      output;
    end;
  end;
run;
New Contributor
Posts: 3

Re: Outputing a dataset of unique pairs of observations by groups

Yes it works very well and I am able to modify it include age and other variables. Thank you very much
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 103 views
  • 0 likes
  • 3 in conversation