Hi all,
here's my dataset
data have;
input id event$ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
i am trying to get cartesian type product within a group i.e combination of all rows within a group(BY id event;).
Expecting output by using DATA STEP, NOT BY PROC SQL
id | Event | Start | End | Start_date | End_date |
1 | Headache | 30JAN2017 | 08FEB2017 | 30JAN2017 | 08FEB2017 |
1 | Headache | 30JAN2017 | 08FEB2017 | 25JAN2017 | 03FEB2017 |
1 | Headache | 25JAN2017 | 03FEB2017 | 30JAN2017 | 08FEB2017 |
1 | Headache | 25JAN2017 | 03FEB2017 | 25JAN2017 | 03FEB2017 |
1 | Vomiting | 04FEB2017 | 05FEB2017 | 04FEB2017 | 05FEB2017 |
4 | Nausea | 20NOV2017 | 31DEC2017 | 20NOV2017 | 31DEC2017 |
4 | Nausea | 20NOV2017 | 31DEC2017 | 03NOV2017 | 05NOV2017 |
4 | Nausea | 03NOV2017 | 05NOV2017 | 20NOV2017 | 31DEC2017 |
4 | Nausea | 03NOV2017 | 05NOV2017 | 03NOV2017 | 05NOV2017 |
Thank you in advance,
Why do you want to specify the tool when Proc SQL does this much easier?
I agree with @ballardw! It would be much easier using proc sql. However, here is one way you could do it using a datastep:
data have;
input id event $ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
data every_combination;
do until (last.event);
set have;
by id event;
i+1;
if first.event then j=i;
if last.event then n=i;
end;
do until (last.event);
set have;
by id event;
output;
end;
run;
data every_combination (drop=i j n);
set every_combination;
do k=j to n;
set have (rename=(start=start_date end=end_date)) point=k;
output;
end;
run;
Art, CEO, AnalystFinder.com
data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;
@novinosrin: Methinks: bad suggestion! While your suggested solution is definitely less complex than the one I offered, it doesn't meet an acceptable performance level (unless the file only contains the five example records posted).
On a relatively small file (50,000) records your code runs 5,053 times slower than the code I proposed. That time grows exponentially with even more records in the dataset.
Art, CEO, AnalystFinder.com
Thank you Sir @art297, will this one compete with yours?
data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
h.definekey('id','event');
h.definedata(all:'y');
h.definedone();
end;
set have;
by id event;
do while(h.do_over(key:id,key:event) eq 0);
output;
end;
run;
Much better! Runs slightly slower than the DOW approach (.06 seconds vs .05 seconds).
FWIW: Here is the code I ran to compare the three methods:
data have;
input id event $ start end;
informat start end ddmmyy10.;
format start end date10.;
if id=1 then do id=1 to 10000;
output;
end;
else if id=4 then do id=10001 to 20000;
output;
end;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
proc sort data=have;
by id event;
run;
data every_combination;
do until (last.event);
set have;
by id event;
i+1;
if first.event then j=i;
if last.event then n=i;
end;
do until (last.event);
set have;
by id event;
output;
end;
run;
data every_combination (drop=i j n);
set every_combination;
do k=j to n;
set have (rename=(start=start_date end=end_date)) point=k;
output;
end;
run;
data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
h.definekey('id','event');
h.definedata(all:'y');
h.definedone();
end;
set have;
by id event;
do while(h.do_over(key:id,key:event) eq 0);
output;
end;
run;
Art, CEO, AnalystFinder.com
@art297 Thank you Sir for the interactive feedback. I am afraid at Depaul college lab, we can't operate(test) with large datasets and so your inputs(if and when you have time) does matter big time. But I think 50,000 records should be possible for me to test which I will do so in future. Thanks as always. I can't appreciate enough
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.