BookmarkSubscribeRSS Feed
s_manoj
Quartz | Level 8

Hi all,

 here's my dataset

 

data have;
input id event$ start end;
informat start end ddmmyy10.;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017 4 Nausea 20-11-2017 31-12-2017 4 Nausea 03-11-2017 05-11-2017 ; run;

i am trying to get cartesian type product within a group i.e combination of all rows within a group(BY id event;).

 

 

Expecting output by using DATA STEP, NOT BY PROC SQL

 

id

Event

Start

End

Start_date

End_date

1

Headache

30JAN2017

08FEB2017

30JAN2017

08FEB2017

1

Headache

30JAN2017

08FEB2017

25JAN2017

03FEB2017

1

Headache

25JAN2017

03FEB2017

30JAN2017

08FEB2017

1

Headache

25JAN2017

03FEB2017

25JAN2017

03FEB2017

1

Vomiting 

04FEB2017

05FEB2017

04FEB2017

05FEB2017

4

Nausea

20NOV2017

31DEC2017

20NOV2017

31DEC2017

4

Nausea

20NOV2017

31DEC2017

03NOV2017

05NOV2017

4

Nausea

03NOV2017

05NOV2017

20NOV2017

31DEC2017

4

Nausea

03NOV2017

05NOV2017

03NOV2017

05NOV2017

 

Thank you in advance,

 

 

7 REPLIES 7
ballardw
Super User

Why do you want to specify the tool when Proc SQL does this much easier?

art297
Opal | Level 21

I agree with @ballardw! It would be much easier using proc sql. However, here is one way you could do it using a datastep:

 

data have;
  input id event $ start end;
  informat start end ddmmyy10.;
  format start end date10.;
  cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;

data every_combination;
  do until (last.event);
    set have;
    by id event;
    i+1;
    if first.event then j=i;
    if last.event then n=i;
  end;
  do until (last.event);
    set have;
    by id event;
    output;
  end;
run;

data every_combination (drop=i j n);
  set every_combination;
  do k=j to n;
    set have (rename=(start=start_date end=end_date)) point=k;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20
data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event  start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017 
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;

data want;
set have;
do _n_=1 to nobs;
set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
if id=_id and event=_event then output;
end;
drop _:;
run;
art297
Opal | Level 21

@novinosrin: Methinks: bad suggestion! While your suggested solution is definitely less complex than the one I offered, it doesn't meet an acceptable performance level (unless the file only contains the five example records posted).

On a relatively small file (50,000) records your code runs 5,053 times slower than the code I proposed. That time grows exponentially with even more records in the dataset.

 

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20

Thank you Sir @art297, will this one compete with yours?

 

data have;
infile cards truncover;
informat id $1. event $15. start end ddmmyy10.;
input id event  start end;
format start end date10.;
cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017 
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
 dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
 h.definekey('id','event');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by id event;
 do while(h.do_over(key:id,key:event) eq 0);
  output;
end;
run;
art297
Opal | Level 21

Much better! Runs slightly slower than the DOW approach (.06 seconds vs .05 seconds).

 

FWIW: Here is the code I ran to compare the three methods:

data have;
  input id event $ start end;
  informat start end ddmmyy10.;
  format start end date10.;
  if id=1 then do id=1 to 10000;
    output;
  end;
  else if id=4 then do id=10001 to 20000;
    output;
  end;
  cards;
1 Headache 30-01-2017 08-02-2017
1 Headache 25-01-2017 03-02-2017
1 vomiting 04-02-2017 05-02-2017
4 Nausea 20-11-2017 31-12-2017
4 Nausea 03-11-2017 05-11-2017
;

proc sort data=have;
  by id event;
run;

data every_combination;
  do until (last.event);
    set have;
    by id event;
    i+1;
    if first.event then j=i;
    if last.event then n=i;
  end;
  do until (last.event);
    set have;
    by id event;
    output;
  end;
run;

data every_combination (drop=i j n);
  set every_combination;
  do k=j to n;
    set have (rename=(start=start_date end=end_date)) point=k;
    output;
  end;
run;

data want;
  set have;
  do _n_=1 to nobs;
  set have(rename=(id=_id event=_event start=start_date end=end_date)) nobs=nobs point=_n_;
  if id=_id and event=_event then output;
  end;
  drop _:;
run;

data want;
if _n_=1 then do;
if 0 then set have(rename=(start=start_date end=end_date));
 dcl hash h(dataset:'have(rename=(start=start_date end=end_date))', multidata: 'y', ordered: 'y');
 h.definekey('id','event');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by id event;
 do while(h.do_over(key:id,key:event) eq 0);
  output;
end;
run;

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20

@art297 Thank you Sir for the interactive feedback. I am afraid  at Depaul college lab, we can't operate(test) with large datasets and so your inputs(if and when you have time) does matter big time.  But I think 50,000 records should be possible for me to test which I will do so in future.  Thanks as always. I can't appreciate enough

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 764 views
  • 1 like
  • 4 in conversation