DATA Step, Macro, Functions and more

create pairwise combinations from longitudinal dataset

Reply
Occasional Contributor
Posts: 13

create pairwise combinations from longitudinal dataset

Hi,

I have a longitudinal dataset (hypothetical) as below with 4 variables (id, gender, time and y);

id gender time y
1 0 1 0.2
1 0 2 1.4
1 0 3 0.9
2 1 1 2.3
2 1 2 7.8
2 1 3 0.1
2 1 4 1.7;

I would like to EFFICIENTLY (because the datasets can get very large) create pairwise combinations, in SAS/IML or using a datastep, within 'id' for the variables 'gender' and 'y' (number of variables can vary) so that I have a final dataset with variables 'id', 'pair' (which is the combination based on the variable time. E.g., id=1 has time=(1,2,3), then the possible pair combinations (s,t) are 1,2; 1,3; and 2,3), 'gender_s', 'gender_t', 'y_s' and 'y_t'. The variables 'gender_s', 'gender_t', 'y_s' and 'y_t' are formed by simply copying the value for 'gender' and 'y' at the s and t time points. The final dataset would look as
follows:

id pair(s,t) gender_s gender_t y_s y_t;
1 (1,2) 0 0 0.2 1.4
1 (1,3) 0 0 0.2 0.9
1 (2,3) 0 0 1.4 0.9
2 (1,2) 1 1 2.3 7.8
2 (1,3) 1 1 2.3 0.1
2 (1,4) 1 1 2.3 1.7
2 (2,3) 1 1 7.8 0.1
2 (2,4) 1 1 7.8 1.7
2 (3,4) 1 1 0.1 1.7

Kindly advise on how best (in terms of efficiency) I can program this?
Many thanks in advance
Regular Contributor
Posts: 241

Re: create pairwise combinations from longitudinal dataset

In terms of coding efficiency, proc sql seems hard to beat for this one. Below assumes that there is no duplicated time within an id.



   /* test data */


   data one;


     input id time x $ y $;


   cards;


   1 1 a b 


   1 2 c d 


   1 3 e f 


   2 1 g h 


   2 2 i j 


   2 3 k l


   2 4 m n


   ;


   run;


 


   /* create time pairs data within id */


   proc sql;


     select d1.id, d1.time as t1, d2.time as t2 


          , d1.x as x1, d1.y as y1


          , d2.x as x2, d2.y as y2


     from   one as d1, one as d2


     where  d1.id = d2.id and d1.time < d2.time


     order by id, t1, t2;


   quit;


   /* on lst


         id  t1  t2  x1  y1  x2  y2


   --------------------------------


          1   1   2   a   b   c   d


          1   1   3   a   b   e   f


          1   2   3   c   d   e   f


          2   1   2   g   h   i   j


          2   1   3   g   h   k   l


          2   1   4   g   h   m   n


          2   2   3   i   j   k   l


          2   2   4   i   j   m   n


          2   3   4   k   l   m   n


   */

Valued Guide
Posts: 2,175

Re: create pairwise combinations from longitudinal dataset

sounds like SQL cartesian
what I call "self join" (is that reflex join?)
select s.ID
, s.time as s_time
, t.time as t_time
, s.gender as gender_s
, t.gender as gender_t,

etc

from
longi_data S ,
longi_data T
where
s.id eq t.id and
s.time LT t.time /* upper triangle of cartesian only */

but I'm cautious about efficiency posting overlapping a better one {ignore mine}
Message was edited by: Peter.C
Super User
Posts: 9,691

Re: create pairwise combinations from longitudinal dataset

[pre]
data temp;
input id gender $ time y;
datalines;
1 0 1 0.2
1 0 2 1.4
1 0 3 0.9
2 1 1 2.3
2 1 2 7.8
2 1 3 0.1
2 1 4 1.7
run;
proc sort data=temp;
by id time;
run;
data result;
set temp;
by id;
length str_time str_y $ 100;
retain str_time str_y;
if first.id then do;
call missing(str_time,str_y);
id_count=0;
end;
id_count+1;
str_time=catx(',',str_time,time);
str_y=catx(',',str_y,y);
if last.id then do;
do i=1 to id_count;
do j=i+1 to id_count;
pairs=cats( '(' ,scan(str_time,i,','), ',' ,scan(str_time,j,','), ')' );
y_s=scan(str_y,i,','); y_t=scan(str_y,j,',');
gender_s=gender;gender_t=gender;
output;
end;
end;
end;
keep id pairs gender_s gender_t y_s y_t;
run;
[/pre]


Ksharp

Message was edited by: Ksharp
Valued Guide
Posts: 2,175

Re: create pairwise combinations from longitudinal dataset

as a side-issue, Ksharp,
please would you change the option in your SAS code editor which will replace tab-characters with spaces.
Then the code you paste into forum replies might look better on a browser screen (like mine)

peter
Super User
Posts: 9,691

Re: create pairwise combinations from longitudinal dataset

Hi.
Peter.I copy these code from my SAS program editor directly.
These code looks fine in my SAS,But I do not know why this forum will distort it.

Thanks Peter.C
Ksharp Message was edited by: Ksharp
Valued Guide
Posts: 632

Re: create pairwise combinations from longitudinal dataset

A TAB character does not necessarily move the same number of spaces in the Enhanced Editor as it will in the forum or, for that matter in my Enhanced Editor. So what i do is in SAS DM with the Enhanced Editor window active.
tools>options>Enhanced Editor>general
check insert spaces for tabs and replace tabs with spaces
Super User
Posts: 9,691

Re: create pairwise combinations from longitudinal dataset

Thanks. ArtC.
Super User
Posts: 9,691

Re: create pairwise combinations from longitudinal dataset

I think it again.There is no need to sort dataset.If dataset just like it.


[pre]
data temp;
input id gender time y;
datalines;
1 0 1 0.2
1 0 2 1.4
1 0 3 0.9
2 1 1 2.3
2 1 2 7.8
2 1 3 0.1
2 1 4 1.7
run;

data result;
set temp end=last;
length str_time str_y $ 100;
retain str_time str_y _gender _id ;
if id ne lag(id) and _n_ ge 2 then do;
do i=1 to id_count;
do j=i+1 to id_count;
pairs=cats( '(' ,scan(str_time,i,','), ',' ,scan(str_time,j,','), ')' );
y_s=scan(str_y,i,','); y_t=scan(str_y,j,',');
id=_id;gender_s=_gender;gender_t=_gender;
output;
end;
end;
call missing(str_time,str_y);
id_count=0;
end;
id_count+1;
str_time=catx(',',str_time,time);
str_y=catx(',',str_y,y);
_id=id;
_gender=gender;
if last then do;
do i=1 to id_count;
do j=i+1 to id_count;
pairs=cats( '(' ,scan(str_time,i,','), ',' ,scan(str_time,j,','), ')' );
y_s=scan(str_y,i,','); y_t=scan(str_y,j,',');
id=_id;gender_s=_gender;gender_t=_gender;
output;
end;
end;
end;
keep _id pairs gender_s gender_t y_s y_t;
run;
[/pre]


Ksharp
Ask a Question
Discussion stats
  • 8 replies
  • 439 views
  • 0 likes
  • 5 in conversation