DATA Step, Macro, Functions and more

Cartesian type product within a group

Reply
Contributor
Posts: 50

Cartesian type product within a group

[ Edited ]

Hi all,

 here's my dataset

 

data have;
input id event$ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017 4 Nausea 20-11-2017 31-12-2017 4 Nausea 03-11-2017 05-11-2017 ; run;

i am trying to get cartesian type product within a group i.e combination of all rows within a group(BY id event;).

 

 

Expecting output by using DATA STEP, NOT BY PROC SQL

 

id

Event

Start

End

Start_date

End_date

1

Headache

30JAN2017

08FEB2017

30JAN2017

08FEB2017

1

Headache

30JAN2017

08FEB2017

25JAN2017

03FEB2017

1

Headache

25JAN2017

03FEB2017

30JAN2017

08FEB2017

1

Headache

25JAN2017

03FEB2017

25JAN2017

03FEB2017

1

Vomiting 

04FEB2017

05FEB2017

04FEB2017

05FEB2017

4

Nausea

20NOV2017

31DEC2017

20NOV2017

31DEC2017

4

Nausea

20NOV2017

31DEC2017

03NOV2017

05NOV2017

4

Nausea

03NOV2017

05NOV2017

20NOV2017

31DEC2017

4

Nausea

03NOV2017

05NOV2017

03NOV2017

05NOV2017

 

Thank you in advance,

 

 

Super User
Posts: 13,947

Re: Cartesian type product within a group

Why do you want to specify the tool when Proc SQL does this much easier?

Super User
Posts: 8,220

Re: Cartesian type product within a group

[ Edited ]

I agree with @ballardw! It would be much easier using proc sql. However, here is one way you could do it using a datastep:

 

data have;
  input id event $ start end;
  informat start end ddmmyy10.;
  format start end date10.;
  cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;

data every_combination;
  do until (last.event);
    set have;
    by id event;
    i+1;
    if first.event then j=i;
    if last.event then n=i;
  end;
  do until (last.event);
    set have;
    by id event;
    output;
  end;
run;

data every_combination (drop=i j n);
  set every_combination;
  do k=j to n;
    set have (rename=(start=start_date end=end_date)) point=k;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 2,073

Re: Cartesian type product within a group

data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event  start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017 
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;

data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;
Super User
Posts: 8,220

Re: Cartesian type product within a group

Posted in reply to novinosrin

@novinosrin: Methinks: bad suggestion! While your suggested solution is definitely less complex than the one I offered, it doesn't meet an acceptable performance level (unless the file only contains the five example records posted).

On a relatively small file (50,000) records your code runs 5,053 times slower than the code I proposed. That time grows exponentially with even more records in the dataset.

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 2,073

Re: Cartesian type product within a group

Thank you Sir @art297, will this one compete with yours?

 

data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event  start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017 
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
 dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
 h.definekey('id','event');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by id event;
 do while(h.do_over(key:id,key:event) eq 0);
  output;
end;
run;
Super User
Posts: 8,220

Re: Cartesian type product within a group

Posted in reply to novinosrin

Much better! Runs slightly slower than the DOW approach (.06 seconds vs .05 seconds).

 

FWIW: Here is the code I ran to compare the three methods:

data have;
  input id event $ start end;
  informat start end ddmmyy10.;
  format start end date10.;
  if id=1 then do id=1 to 10000;
    output;
  end;
  else if id=4 then do id=10001 to 20000;
    output;
  end;
  cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;

proc sort data=have;
  by id event;
run;

data every_combination;
  do until (last.event);
    set have;
    by id event;
    i+1;
    if first.event then j=i;
    if last.event then n=i;
  end;
  do until (last.event);
    set have;
    by id event;
    output;
  end;
run;

data every_combination (drop=i j n);
  set every_combination;
  do k=j to n;
    set have (rename=(start=start_date end=end_date)) point=k;
    output;
  end;
run;

data want;
  set have;
  do _n_=1 to nobs;
  set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
  if id=_id and event=_event then output;
  end;
  drop _:;
run;

data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
 dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
 h.definekey('id','event');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by id event;
 do while(h.do_over(key:id,key:event) eq 0);
  output;
end;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 2,073

Re: Cartesian type product within a group

@art297 Thank you Sir for the interactive feedback. I am afraid  at Depaul college lab, we can't operate(test) with large datasets and so your inputs(if and when you have time) does matter big time.  But I think 50,000 records should be possible for me to test which I will do so in future.  Thanks as always. I can't appreciate enough

Ask a Question
Discussion stats
  • 7 replies
  • 117 views
  • 1 like
  • 4 in conversation