DATA Step, Macro, Functions and more

Matching Time variables plus or minus 5 seconds

Reply
Frequent Contributor
Posts: 78

Matching Time variables plus or minus 5 seconds

I have two groups of 37 time variables (formatted time8.):

id time1-time37 timefinal1-timefinal37

I need to match up the times for the two groups of variables, but within + or - 5 seconds.

Also, time1 may not actually be the same as timefinal1, but all times are in consecutive order. I need to line up the times in timefinal1-timefinal37 to match time1-time37 while skipping over times that do not match.

Thanks!
Super Contributor
Super Contributor
Posts: 3,174

Re: Matching Time variables plus or minus 5 seconds

A DATA step using one or more ARRAY definitions (with your two sets of variables, maybe combined as one array - not sure) and a DO/END code approach to compare consecutive variables in the array(s) using IF / THEN comparison technique should work.

Suggest searching the SAS support http://support.sas.com/ website for DOC and topic-related conference/technical papers on "array processing" - here is a Google advanced search argument to get you started:

array processing numeric comparison site:sas.com

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: Matching Time variables plus or minus 5 seconds

I assume you know that times are stored as the number of seconds, so you only have to compare abs(t_one - t_two) to 5.

Is it possible for there to be more than one match (within 5 seconds)? This could make a big difference in the array logic, as you'd have to worry about picking the best of multiple matches.
PROC Star
Posts: 1,760

Re: Matching Time variables plus or minus 5 seconds

You need to provide a bit more info.
In particular, is there a reference? i.e if there are times every 3 seconds (say t1=3, t2=6, t3=9), where do you cut off? the first 2 could go together, the last 2 too, but not all 3. How do pick the prefered reference to form a group around?
Frequent Contributor
Posts: 78

Re: Matching Time variables plus or minus 5 seconds

The separate times should be no closer than 15 seconds apart and if they are , there is a problem. However, the seconds may not be precise due to reader variability, this is why I want to account for plus or minus 5 seconds so that I capture the correct times.

I'm still working on this, but I haven't yet figured out how to put it all together. I can match the times using an array, but I would have to ignore seconds and then I can end up having two times within the same minute. So, I have to figure out how to match the times within a range of 10 seconds.

Hope this makes sense.

Thanks for your suggestions, I will give some a try.
N/A
Posts: 0

Re: Matching Time variables plus or minus 5 seconds

Why don't you post (at least) a single example row of (real) data? That way, people can get a better feel for what is actually involved.
Frequent Contributor
Posts: 78

Re: Matching Time variables plus or minus 5 seconds

Posted in reply to deleted_user
id time1 time2 time3 time4 time5 timef1 timef2 timef3 timef4 timef5
1 4:56:00 5:00:35 5:03:10 5:09:00 5:13:40 4:56:04 5:00:38 5:03:12 5:09:01 5:13:37
2 3:09:00 3:09:55 3:11:00 3:13:05 3:14:45 3:09:00 3:10:25 3:11:02 3:13:05 3:14:43

Here are two samples of real data looking at the first 5 times.
Where time1-time5 should match timef1-timef5 within a 10 second range.

Also, the original variables were entered as separate hours minutes and seconds. I took this data and created a complete time variable using:

Time1 = HMS(time1_HR,time1_MIN,time1_SEC);

If it would be easier to match using the separate variables, I could do this as well. Message was edited by: statadm
N/A
Posts: 0

Re: Matching Time variables plus or minus 5 seconds

In these two examples, though, the match is perfect -- correct?
Frequent Contributor
Posts: 78

Re: Matching Time variables plus or minus 5 seconds

Posted in reply to deleted_user
Almost.

In example 2:

2 3:09:00 3:09:55 3:11:00 3:13:05 3:14:45 3:09:00 3:10:25 3:11:02 3:13:05 3:14:43

The second times would not match, so I would need to skip over this and move all other times forward, or even better, to output the non-matching times to another dataset.
N/A
Posts: 0

Re: Matching Time variables plus or minus 5 seconds

Ah, yes, I missed that. The program below is a very rough first attempt. It doesn't attempt to move anything "forward" or output the mis-matched times (though that would be easy to do). It also handles the possibility that there might be more than one match, which I'm not sure is true!


data test1;

informat t1-t5 ft1-ft5 time8.;

input id t1-t5 ft1-ft5;

array _a {*} t1-t5;
array _b {*} ft1-ft5;
array _match(5) _temporary_;

do i=1 to dim(_a);
bestmatch=6;
do j=1 to dim(_b);
deltatime=abs(_a(i) - _b(j));
if deltatime le 5 then do;
if deltatime lt bestmatch then do;
bestmatch = deltatime;
_match(i) = j;
end;
end;
end;
if bestmatch le 5 then
put id= i= _match(i)= bestmatch=;
else
put id= i= +1 "no match for this time point";
end;


cards;
1 4:56:00 5:00:35 5:03:10 5:09:00 5:13:40 4:56:04 5:00:38 5:03:12 5:09:01 5:13:37
2 3:09:00 3:09:55 3:11:00 3:13:05 3:14:45 3:09:00 3:10:25 3:11:02 3:13:05 3:14:43
;
run;
Frequent Contributor
Posts: 78

Re: Matching Time variables plus or minus 5 seconds

Posted in reply to deleted_user
Thank you, I will try to modify this for my program and see if I can find away to move everything forward.

I appreciate the help with doing the plus or minus 5 seconds.
Super Contributor
Posts: 359

Re: Matching Time variables plus or minus 5 seconds

This will compare the times and move them forward.

do i = 1 to 5;
if abs(itime(i) - ftime(i)) > 5 then do j = i to 4;
itime(j) = itime(j+1);
ftime(j) = ftime(J+1);
itime(j+1) = .;
ftime(J+1) = .;
if i=j then i=i-1;
end;
end;
Frequent Contributor
Posts: 78

Re: Matching Time variables plus or minus 5 seconds

You are all great! Thanks for the code, this program is working great for what I need.
Frequent Contributor
Posts: 78

Re: Matching Time variables plus or minus 5 seconds

I am still having trouble getting this to work and some parts have changed.

I have to match times within 5 seconds the first time around, and take those that didn't match and match them within 15 seconds. Anything that does not match the second time, will be dropped from the final dataset for those variables (missing data) .

I also have 13 additional variables that are associated with the times, so when I match up the times, those associated variables also have to stay with those times.

So, i would have ID, time1-time37, timef1-timef37, a1-a37, b1-b37, c1-c37, etc.

If I take real data:

id time1 time2 time3 time4 time5 timef1 timef2 timef3 timef4 timef5
1 5:56:33 6:01:44 6:10:47 6:16:00 6:17:55 5:56:17 6:01:45 6:10:35 6:15:58 6:17:54
2 2:05:03 2:14:05 2:18:48 2:31:15 . 2:05:04 2:08:55 2:14:00 2:18:44 2:23:35
3 8:16:19 8:18:18 8:21:57 8:25:04 8:27:41 8:16:18 8:18:07 8:20:17 8:25:04 8:27:20

Some match within 5 seconds, some within 15 seconds and some not at all.

In the end, I need to have the times ordered 1-37 in time order, but only if they match and with associated time descriptive variables (a1-a37, b1-b37, etc).

So there will be the same amount of time1-time37 values as timef1-timef37 all matching one another where Time1=timef1 time2=timef2, etc. within 5 or 15 seconds depending on how it was selected.

What I envision doing with SAS is matching up the times within 5 seconds, output to a dataset, match up the leftover times within 15 seconds (not including those that are already matched within 5) and ouput them to another dataset. Remove all other times, merge back the other two datasets and sort and reorder the variables by time. The times should all fall within an ~30 minute time range.

I've been mulling over this for a week and writing code with no success other than to determine which times match and which do not.

Thanks again!
PROC Star
Posts: 1,760

Re: Matching Time variables plus or minus 5 seconds

Still not sure I fully understand what you want, but it seems to me that you don't care where times are stored (time1, time2, timef25, etc), you just want to find times which are close to each other, and match them.
How about you put all times in one variable (one record per time stamp), sort, and then derive the difference with the previous record.
So your table will be:
TIME DIF
2:08:55 231
2:14:00 305
2:14:05 5
2:14:05 0
2:18:44 279
2:18:48 4

This way you can more easily use logic to define groups of time as you like. Once you have defined groups, go back to your data and apply the groups to the times.

Another even easier way, if time groups are well separated as you first told me, is just to round all times to the nearest x seconds and get your groups from the rounded values.
Ask a Question
Discussion stats
  • 17 replies
  • 324 views
  • 0 likes
  • 5 in conversation