BookmarkSubscribeRSS Feed
KGeorge
Calcite | Level 5
Hi,

I have a longitudinal dataset (hypothetical) as below with 4 variables (id, gender, time and y);

id gender time y
1 0 1 0.2
1 0 2 1.4
1 0 3 0.9
2 1 1 2.3
2 1 2 7.8
2 1 3 0.1
2 1 4 1.7;

I would like to EFFICIENTLY (because the datasets can get very large) create pairwise combinations, in SAS/IML or using a datastep, within 'id' for the variables 'gender' and 'y' (number of variables can vary) so that I have a final dataset with variables 'id', 'pair' (which is the combination based on the variable time. E.g., id=1 has time=(1,2,3), then the possible pair combinations (s,t) are 1,2; 1,3; and 2,3), 'gender_s', 'gender_t', 'y_s' and 'y_t'. The variables 'gender_s', 'gender_t', 'y_s' and 'y_t' are formed by simply copying the value for 'gender' and 'y' at the s and t time points. The final dataset would look as
follows:

id pair(s,t) gender_s gender_t y_s y_t;
1 (1,2) 0 0 0.2 1.4
1 (1,3) 0 0 0.2 0.9
1 (2,3) 0 0 1.4 0.9
2 (1,2) 1 1 2.3 7.8
2 (1,3) 1 1 2.3 0.1
2 (1,4) 1 1 2.3 1.7
2 (2,3) 1 1 7.8 0.1
2 (2,4) 1 1 7.8 1.7
2 (3,4) 1 1 0.1 1.7

Kindly advise on how best (in terms of efficiency) I can program this?
Many thanks in advance
8 REPLIES 8
chang_y_chung_hotmail_com
Obsidian | Level 7
In terms of coding efficiency, proc sql seems hard to beat for this one. Below assumes that there is no duplicated time within an id.



   /* test data */


   data one;


     input id time x $ y $;


   cards;


   1 1 a b 


   1 2 c d 


   1 3 e f 


   2 1 g h 


   2 2 i j 


   2 3 k l


   2 4 m n


   ;


   run;


 


   /* create time pairs data within id */


   proc sql;


     select d1.id, d1.time as t1, d2.time as t2 


          , d1.x as x1, d1.y as y1


          , d2.x as x2, d2.y as y2


     from   one as d1, one as d2


     where  d1.id = d2.id and d1.time < d2.time


     order by id, t1, t2;


   quit;


   /* on lst


         id  t1  t2  x1  y1  x2  y2


   --------------------------------


          1   1   2   a   b   c   d


          1   1   3   a   b   e   f


          1   2   3   c   d   e   f


          2   1   2   g   h   i   j


          2   1   3   g   h   k   l


          2   1   4   g   h   m   n


          2   2   3   i   j   k   l


          2   2   4   i   j   m   n


          2   3   4   k   l   m   n


   */

Peter_C
Rhodochrosite | Level 12
sounds like SQL cartesian
what I call "self join" (is that reflex join?)
select s.ID
, s.time as s_time
, t.time as t_time
, s.gender as gender_s
, t.gender as gender_t,

etc

from
longi_data S ,
longi_data T
where
s.id eq t.id and
s.time LT t.time /* upper triangle of cartesian only */

but I'm cautious about efficiency posting overlapping a better one {ignore mine}
Message was edited by: Peter.C
Ksharp
Super User
[pre]
data temp;
input id gender $ time y;
datalines;
1 0 1 0.2
1 0 2 1.4
1 0 3 0.9
2 1 1 2.3
2 1 2 7.8
2 1 3 0.1
2 1 4 1.7
run;
proc sort data=temp;
by id time;
run;
data result;
set temp;
by id;
length str_time str_y $ 100;
retain str_time str_y;
if first.id then do;
call missing(str_time,str_y);
id_count=0;
end;
id_count+1;
str_time=catx(',',str_time,time);
str_y=catx(',',str_y,y);
if last.id then do;
do i=1 to id_count;
do j=i+1 to id_count;
pairs=cats( '(' ,scan(str_time,i,','), ',' ,scan(str_time,j,','), ')' );
y_s=scan(str_y,i,','); y_t=scan(str_y,j,',');
gender_s=gender;gender_t=gender;
output;
end;
end;
end;
keep id pairs gender_s gender_t y_s y_t;
run;
[/pre]


Ksharp

Message was edited by: Ksharp
Peter_C
Rhodochrosite | Level 12
as a side-issue, Ksharp,
please would you change the option in your SAS code editor which will replace tab-characters with spaces.
Then the code you paste into forum replies might look better on a browser screen (like mine)

peter
Ksharp
Super User
Hi.
Peter.I copy these code from my SAS program editor directly.
These code looks fine in my SAS,But I do not know why this forum will distort it.

Thanks Peter.C
Ksharp Message was edited by: Ksharp
ArtC
Rhodochrosite | Level 12
A TAB character does not necessarily move the same number of spaces in the Enhanced Editor as it will in the forum or, for that matter in my Enhanced Editor. So what i do is in SAS DM with the Enhanced Editor window active.
tools>options>Enhanced Editor>general
check insert spaces for tabs and replace tabs with spaces
Ksharp
Super User
I think it again.There is no need to sort dataset.If dataset just like it.


[pre]
data temp;
input id gender time y;
datalines;
1 0 1 0.2
1 0 2 1.4
1 0 3 0.9
2 1 1 2.3
2 1 2 7.8
2 1 3 0.1
2 1 4 1.7
run;

data result;
set temp end=last;
length str_time str_y $ 100;
retain str_time str_y _gender _id ;
if id ne lag(id) and _n_ ge 2 then do;
do i=1 to id_count;
do j=i+1 to id_count;
pairs=cats( '(' ,scan(str_time,i,','), ',' ,scan(str_time,j,','), ')' );
y_s=scan(str_y,i,','); y_t=scan(str_y,j,',');
id=_id;gender_s=_gender;gender_t=_gender;
output;
end;
end;
call missing(str_time,str_y);
id_count=0;
end;
id_count+1;
str_time=catx(',',str_time,time);
str_y=catx(',',str_y,y);
_id=id;
_gender=gender;
if last then do;
do i=1 to id_count;
do j=i+1 to id_count;
pairs=cats( '(' ,scan(str_time,i,','), ',' ,scan(str_time,j,','), ')' );
y_s=scan(str_y,i,','); y_t=scan(str_y,j,',');
id=_id;gender_s=_gender;gender_t=_gender;
output;
end;
end;
end;
keep _id pairs gender_s gender_t y_s y_t;
run;
[/pre]


Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2327 views
  • 0 likes
  • 5 in conversation