BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MM88
Calcite | Level 5

Hi experts,

 

I have a dataset which have columns Event and Time. I need to create columns Group and Cumulative. What I need to measure is the duration of the Event 'Event1_Stop' until an 'Event1_Start' appears. Last group should sum the time meaning that the stop is ongoing and no start for the event has entered.
My data sample is:

data have;
length Event $15;
input Event $ Time;
datalines;
Event3_Start 0.2
Event2_Start 0.4
Event2_End 0.2
Event1_Stop 0.2
Event3_Start 0
Event4_Start 0.5
Event3_Stop 0.2
Event1_Start 0
Event4_Stop 0
Event4_Stop 0
Event1_Stop 0.3
Event3_Start 0.3
Event1_Start 0
Event3_Start 0.4
Event3_Stop 0
Event1_Stop 0.2
Event3_Start 0.2
Event2_Start 0.4
run;

The result dataset that I need to obtain is:

data have;
length Event $15;
input Event $ Time Group Cumulative;
datalines;
Event3_Start 0.2 0 0
Event2_Start 0.4 0 0
Event2_End 0.2 0 0
Event1_Stop 0.2 1 0.9
Event3_Start 0 1 0
Event4_Start 0.5 1 0
Event3_Stop 0.2 1 0
Event1_Start 0 0 0
Event4_Stop 0 0 0
Event4_Stop 0 0 0
Event1_Stop 0.3 2 0.6
Event3_Start 0.3 2 0
Event1_Start 0 0 0
Event3_Start 0.4 0 0
Event3_Stop 0 0 0
Event1_Stop 0.2 3 0.8
Event3_Start 0.2 3 0
Event2_Start 0.4 3 0
run;

Thanks for your suggestions.
Regards.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
First, thank you for posting the sample starting data and expected results in working data steps.  This made it possible for me to understand the objective, and to test my suggestion, which follows:
 
data have;
length Event $15;
input Event $ Time;
datalines;
Event3_Start 0.2
Event2_Start 0.4
Event2_End 0.2
Event1_Stop 0.2
Event3_Start 0
Event4_Start 0.5
Event3_Stop 0.2
Event1_Start 0
Event4_Stop 0
Event4_Stop 0
Event1_Stop 0.3
Event3_Start 0.3
Event1_Start 0
Event3_Start 0.4
Event3_Stop 0
Event1_Stop 0.2
Event3_Start 0.2
Event2_Start 0.4
run;
data stop_to_start (keep=group cumulative);
set have end=end_of_have;
group+(event='Event1_Stop');
if event='Event1_Stop' then cumulative=0;
cumulative+time;
if end_of_have or event='Event1_Start' ;
run;

data want;
set have;
if _n_=1 or event='Event1_Start' then group=0;
cumulative=0;
if event='Event1_Stop' then set stop_to_start;
run;
The strategy here is to make an intermediate dataset, stop_to_start with one observation per time span , containing two variables: cumulative and group.
 
Then just read HAVE, set cumulative to 0, and read a stop_to_start observation only when you encounter an Event1_Stop record from have.  This updates cumulative and group.
 
Because the group variable is read by a SET stop_to_start statement, it is automatically retained  So resetting it to zero when you encounter an Event1_Start observation, it will be propagated as zero until the next Event1_Stop.
 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star
First, thank you for posting the sample starting data and expected results in working data steps.  This made it possible for me to understand the objective, and to test my suggestion, which follows:
 
data have;
length Event $15;
input Event $ Time;
datalines;
Event3_Start 0.2
Event2_Start 0.4
Event2_End 0.2
Event1_Stop 0.2
Event3_Start 0
Event4_Start 0.5
Event3_Stop 0.2
Event1_Start 0
Event4_Stop 0
Event4_Stop 0
Event1_Stop 0.3
Event3_Start 0.3
Event1_Start 0
Event3_Start 0.4
Event3_Stop 0
Event1_Stop 0.2
Event3_Start 0.2
Event2_Start 0.4
run;
data stop_to_start (keep=group cumulative);
set have end=end_of_have;
group+(event='Event1_Stop');
if event='Event1_Stop' then cumulative=0;
cumulative+time;
if end_of_have or event='Event1_Start' ;
run;

data want;
set have;
if _n_=1 or event='Event1_Start' then group=0;
cumulative=0;
if event='Event1_Stop' then set stop_to_start;
run;
The strategy here is to make an intermediate dataset, stop_to_start with one observation per time span , containing two variables: cumulative and group.
 
Then just read HAVE, set cumulative to 0, and read a stop_to_start observation only when you encounter an Event1_Stop record from have.  This updates cumulative and group.
 
Because the group variable is read by a SET stop_to_start statement, it is automatically retained  So resetting it to zero when you encounter an Event1_Start observation, it will be propagated as zero until the next Event1_Stop.
 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MM88
Calcite | Level 5
Thanks @mkeintz, your answer was very helpful. Sorry for the late reply, I was modifying your code to fullfill other restrictions on my data.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 374 views
  • 0 likes
  • 2 in conversation