Cartesian type product within a group

s_manoj · Posted 07-23-2018 12:45 PM

Hi all,

here's my dataset

data have;
input id event$ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;

i am trying to get cartesian type product within a group i.e combination of all rows within a group(BY id event;).

Expecting output by using DATA STEP, NOT BY PROC SQL

id	Event	Start	End	Start_date	End_date
1	Headache	30JAN2017	08FEB2017	30JAN2017	08FEB2017
1	Headache	30JAN2017	08FEB2017	25JAN2017	03FEB2017
1	Headache	25JAN2017	03FEB2017	30JAN2017	08FEB2017
1	Headache	25JAN2017	03FEB2017	25JAN2017	03FEB2017
1	Vomiting	04FEB2017	05FEB2017	04FEB2017	05FEB2017
4	Nausea	20NOV2017	31DEC2017	20NOV2017	31DEC2017
4	Nausea	20NOV2017	31DEC2017	03NOV2017	05NOV2017
4	Nausea	03NOV2017	05NOV2017	20NOV2017	31DEC2017
4	Nausea	03NOV2017	05NOV2017	03NOV2017	05NOV2017

Thank you in advance,

ballardw · Posted 07-23-2018 01:20 PM

Why do you want to specify the tool when Proc SQL does this much easier?

art297 · Posted 07-23-2018 01:41 PM

I agree with @ballardw! It would be much easier using proc sql. However, here is one way you could do it using a datastep:

data have;
  input id event $ start end;
  informat start end ddmmyy10.;
  format start end date10.;
  cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;

data every_combination;
  do until (last.event);
    set have;
    by id event;
    i+1;
    if first.event then j=i;
    if last.event then n=i;
  end;
  do until (last.event);
    set have;
    by id event;
    output;
  end;
run;

data every_combination (drop=i j n);
  set every_combination;
  do k=j to n;
    set have (rename=(start=start_date end=end_date)) point=k;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

novinosrin · Posted 07-23-2018 01:52 PM

data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event  start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017 
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;

data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;

art297 · Posted 07-23-2018 02:35 PM

@novinosrin: Methinks: bad suggestion! While your suggested solution is definitely less complex than the one I offered, it doesn't meet an acceptable performance level (unless the file only contains the five example records posted).

On a relatively small file (50,000) records your code runs 5,053 times slower than the code I proposed. That time grows exponentially with even more records in the dataset.

Art, CEO, AnalystFinder.com

novinosrin · Posted 07-23-2018 02:43 PM

Thank you Sir @art297, will this one compete with yours?

data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event  start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017 
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
 dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
 h.definekey('id','event');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by id event;
 do while(h.do_over(key:id,key:event) eq 0);
  output;
end;
run;

art297 · Posted 07-23-2018 03:00 PM

Much better! Runs slightly slower than the DOW approach (.06 seconds vs .05 seconds).

FWIW: Here is the code I ran to compare the three methods:

data have;
  input id event $ start end;
  informat start end ddmmyy10.;
  format start end date10.;
  if id=1 then do id=1 to 10000;
    output;
  end;
  else if id=4 then do id=10001 to 20000;
    output;
  end;
  cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;

proc sort data=have;
  by id event;
run;

data every_combination;
  do until (last.event);
    set have;
    by id event;
    i+1;
    if first.event then j=i;
    if last.event then n=i;
  end;
  do until (last.event);
    set have;
    by id event;
    output;
  end;
run;

data every_combination (drop=i j n);
  set every_combination;
  do k=j to n;
    set have (rename=(start=start_date end=end_date)) point=k;
    output;
  end;
run;

data want;
  set have;
  do _n_=1 to nobs;
  set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
  if id=_id and event=_event then output;
  end;
  drop _:;
run;

data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
 dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
 h.definekey('id','event');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by id event;
 do while(h.do_over(key:id,key:event) eq 0);
  output;
end;
run;

Art, CEO, AnalystFinder.com

novinosrin · Posted 07-23-2018 03:06 PM

@art297 Thank you Sir for the interactive feedback. I am afraid at Depaul college lab, we can't operate(test) with large datasets and so your inputs(if and when you have time) does matter big time. But I think 50,000 records should be possible for me to test which I will do so in future. Thanks as always. I can't appreciate enough

Cartesian type product within a group

Re: Cartesian type product within a group

Re: Cartesian type product within a group

Re: Cartesian type product within a group

Re: Cartesian type product within a group

Re: Cartesian type product within a group

Re: Cartesian type product within a group

Re: Cartesian type product within a group

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away