BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brucehughw
Obsidian | Level 7

hello,

I have data that includes a time variable and weather descriptions from rater1 and rater2, e.g.,

data have;
input time : time5. rater1 : $ rater2 : $ ;
format  time time5.  ;
cards; 
1:01       RA DZ
1:02       RA DZ
1:03       RA DZ
2:06       DZ PL
2:07       DZ PL
2:15 PL ; run;

these sequences can go quite long (not just three minutes, but maybe three hours), and there are many of them. What I'd like is a summary of each, something like:

data want;
input startTime : time5. duration rater1 : $ rater2 : $ ;
format  time time5.  ;
cards; 
1:01   3    RA DZ
2:06   2    DZ PL
2:15 1 PL ; run;

rater1 and rater2 values include RA, DZ, PL, ' ', SN, RAPL, SNPL, and RASN. Any suggestions?

 

Thanks, Bruce

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Aha!  Thanks for the answers.

 

I suggest adding a variable to number the sequences.

 

data with_sequence;

set have;

by rater1 rater2 notsorted;

time_dif = dif(time);

if first.rater2 then sequence + 1;

else if time_dif > 60 then sequence + 1;

run;

 

You can decide if the cutoff point of > 60 needs to be adjusted or not.

 

Then summarize:

 

proc summary data=with _sequence;

by sequence rater1 rater2 notsorted;

var time;

output out=summarized (drop=_type_ _freq_) min=time_begins max=time_ends;

run;

 

You would still need to read the summary back in, to compute duration.  Something along these lines:

 

data want;

set with_sequence;

duration = time_ends - time_begins + 60;

run;

 

I expect that the TIME-based statistics are measured in seconds thus you need to add 60.  But if you test this and find that's not the case, you can always adjust the formula.

 

Good luck.

 

Oops!  Added NOTSORTED a second time.

 

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Bruce,

 

Your question leaves a number of questions open to interpretation.  Perhaps you could narrow down the problem by addressing a few of these.

 

What is the definition of a sequence? 

 

If enough time passes, but the raters stay the same, would that begin a new sequence?

 

Within the same sequence could the raters switch positions (so that rater #1 becomes rater #2 and vice versa)?

 

Is duration a count of records, or does it represent a calculation based on the first and last TIME value?

 

Can two sequences overlap?

 

What is the order to the incoming data records? 

 

Three of these questions are really intertwined:  the definition of a sequence, overlapping sequences, the order to the incoming records.  They are all ways of looking at how the data identifies a sequence.

 

The program might be as simple as:

 

proc summary data=have nway;

class rater1 rater2;

var time;

output out=want (drop-_type_ rename=(_freq_=duration)) min=start_time;

run;

 

But I feel like I'm guessing at what needs to be done.

brucehughw
Obsidian | Level 7

Hi,

 

thanks for looking into my question. To answer your questions:

 

What is the definition of a sequence? A sequence comprises a contiguous block of time, e.g. 1:01 1:02 1:03, the same value for rater1, and the same value for rater2. If the time skips a given minute or a rater's value changes, the original sequence ends. Typically, a new sequence will begin when the time will skip a value. I'd be satisifed with this solution (sequences based on this skipping). But code that watches for both skipping time and changes in a rater's value  would be very nice.

 

If enough time passes, but the raters stay the same, would that begin a new sequence? if depends, any gap larger than a minute breaks the sequence. If the time does not skip any minutes, the sequence continues.

 

Within the same sequence could the raters switch positions (so that rater #1 becomes rater #2 and vice versa)? No. If either rater changes their "report," e.g., one switches from PL to DZ, this begins a new sequence. 

 

Is duration a count of records, or does it represent a calculation based on the first and last TIME value? Duration is last time - first time + 1 minute

 

Can two sequences overlap? No, time is monotonically increasing (always increasing)

 

What is the order to the incoming data records? sorted by time

 

Three of these questions are really intertwined:  the definition of a sequence, overlapping sequences, the order to the incoming records.  They are all ways of looking at how the data identifies a sequence.

 

Your proc summary worked on my toy set. But if I add a new pair of RA DZ at 1:08, proc summary does not return the correct value. This new pair would start a new sequence

 

Thanks very much, Bruce

Astounding
PROC Star

Aha!  Thanks for the answers.

 

I suggest adding a variable to number the sequences.

 

data with_sequence;

set have;

by rater1 rater2 notsorted;

time_dif = dif(time);

if first.rater2 then sequence + 1;

else if time_dif > 60 then sequence + 1;

run;

 

You can decide if the cutoff point of > 60 needs to be adjusted or not.

 

Then summarize:

 

proc summary data=with _sequence;

by sequence rater1 rater2 notsorted;

var time;

output out=summarized (drop=_type_ _freq_) min=time_begins max=time_ends;

run;

 

You would still need to read the summary back in, to compute duration.  Something along these lines:

 

data want;

set with_sequence;

duration = time_ends - time_begins + 60;

run;

 

I expect that the TIME-based statistics are measured in seconds thus you need to add 60.  But if you test this and find that's not the case, you can always adjust the formula.

 

Good luck.

 

Oops!  Added NOTSORTED a second time.

 

brucehughw
Obsidian | Level 7
Very nice! Thanks! Note, the data want near the bottom should be changed to "set summarized". Also, leaving the _freq_ in the proc summary provides the duration in minutes w/o needing the additional data step. Thanks a lot for the hand. I often have data with skips in the time and have struggled with numbering contiguous sequences. Now I know how.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 955 views
  • 0 likes
  • 2 in conversation