Hi all,
here's my dataset
data have;
input id event$ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
i am trying to get cartesian type product within a group i.e combination of all rows within a group(BY id event;).
Expecting output by using DATA STEP, NOT BY PROC SQL
id | Event | Start | End | Start_date | End_date |
1 | Headache | 30JAN2017 | 08FEB2017 | 30JAN2017 | 08FEB2017 |
1 | Headache | 30JAN2017 | 08FEB2017 | 25JAN2017 | 03FEB2017 |
1 | Headache | 25JAN2017 | 03FEB2017 | 30JAN2017 | 08FEB2017 |
1 | Headache | 25JAN2017 | 03FEB2017 | 25JAN2017 | 03FEB2017 |
1 | Vomiting | 04FEB2017 | 05FEB2017 | 04FEB2017 | 05FEB2017 |
4 | Nausea | 20NOV2017 | 31DEC2017 | 20NOV2017 | 31DEC2017 |
4 | Nausea | 20NOV2017 | 31DEC2017 | 03NOV2017 | 05NOV2017 |
4 | Nausea | 03NOV2017 | 05NOV2017 | 20NOV2017 | 31DEC2017 |
4 | Nausea | 03NOV2017 | 05NOV2017 | 03NOV2017 | 05NOV2017 |
Thank you in advance,
Why do you want to specify the tool when Proc SQL does this much easier?
I agree with @ballardw! It would be much easier using proc sql. However, here is one way you could do it using a datastep:
data have;
input id event $ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
data every_combination;
do until (last.event);
set have;
by id event;
i+1;
if first.event then j=i;
if last.event then n=i;
end;
do until (last.event);
set have;
by id event;
output;
end;
run;
data every_combination (drop=i j n);
set every_combination;
do k=j to n;
set have (rename=(start=start_date end=end_date)) point=k;
output;
end;
run;
Art, CEO, AnalystFinder.com
data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;
@novinosrin: Methinks: bad suggestion! While your suggested solution is definitely less complex than the one I offered, it doesn't meet an acceptable performance level (unless the file only contains the five example records posted).
On a relatively small file (50,000) records your code runs 5,053 times slower than the code I proposed. That time grows exponentially with even more records in the dataset.
Art, CEO, AnalystFinder.com
Thank you Sir @art297, will this one compete with yours?
data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
h.definekey('id','event');
h.definedata(all:'y');
h.definedone();
end;
set have;
by id event;
do while(h.do_over(key:id,key:event) eq 0);
output;
end;
run;
Much better! Runs slightly slower than the DOW approach (.06 seconds vs .05 seconds).
FWIW: Here is the code I ran to compare the three methods:
data have;
input id event $ start end;
informat start end ddmmyy10.;
format start end date10.;
if id=1 then do id=1 to 10000;
output;
end;
else if id=4 then do id=10001 to 20000;
output;
end;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
proc sort data=have;
by id event;
run;
data every_combination;
do until (last.event);
set have;
by id event;
i+1;
if first.event then j=i;
if last.event then n=i;
end;
do until (last.event);
set have;
by id event;
output;
end;
run;
data every_combination (drop=i j n);
set every_combination;
do k=j to n;
set have (rename=(start=start_date end=end_date)) point=k;
output;
end;
run;
data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
h.definekey('id','event');
h.definedata(all:'y');
h.definedone();
end;
set have;
by id event;
do while(h.do_over(key:id,key:event) eq 0);
output;
end;
run;
Art, CEO, AnalystFinder.com
@art297 Thank you Sir for the interactive feedback. I am afraid at Depaul college lab, we can't operate(test) with large datasets and so your inputs(if and when you have time) does matter big time. But I think 50,000 records should be possible for me to test which I will do so in future. Thanks as always. I can't appreciate enough
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.