BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
davidsmarch
Calcite | Level 5
I saw some previous solutions that don't quite fit, so I though I'd try to see if I could get some help. 

 

I have two datasets. One dataset (time.dat) is very long (millions of rows) and contains three variables: subject, column, and time_n in this format:

 

subj col   time_n
2      1    1564623266038
2      2    1564623266044
2      3    1564623266058
2      4    1564623266073
2      5    1564623266090
2      6    1564623266106

.

.

.

 

Each subject has 1000's of columns that each have a unique time_n. There are also many subjects.

 

The second set (onloadtime.dat) has all of the same subjects and three variables, subject, trial, and onloadtime_n in this format:

 

subj trial onloadtime_n
2      1    1564623265779
2      2    1564623272356
2      3    1564623275373
2      4    1564623278728

2      5    1564623282137

 

Each subject only has 180 onloadtime's. What I need to do is find the closest time_n to each onloadtime_n and to indicate which trial corresponds to that time. Such that the resultant data looks like:

 

subj col    time_n                 onloadtime_n       trial
2      1   1564623266038    1564623265779     1  
2      2   1564623266044
2      3   1564623266058
2      4   1564623266073
2      5   1564623266090
2      6   1564623266106

.

.

.

2    450  1564623272324    1564623272356   2

 

And to repeat this per subject. Essentially I need to know what trial in the shorter set (onloadtime) corresponds to what closest column in the long set (time), and to the indicate what time trial that actually is. I hope that makes sense. I'd love any support and would be happy to answer any questions. 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

You can, of course, write an elegant solution in SQL, submit it, and wait a couple of days before getting the result.

 

An alternative may be a data step solution something like this (probably a lot faster):

data time;
input subj col   time_n;
cards;
2      1    1564623266038
2      2    1564623266044
2      3    1564623266058
2      4    1564623266073
2      5    1564623266090
2      6    1564623266106
;run;

data onloadtime;
  input subj trial onloadtime_n;
cards;
2      1    1564623265779
;run;

Data v_interval/view=v_interval;
  set time;
  by subj;
  if first.subj then do;
    next_time=time_n;
      time_n=-1E38;
      output;
      time_n=next_time;
      end;
  if not last.subj then do;
    _N_=_N_+1;
    set time(keep=time_n rename=(time_n=next_time)) point=_N_;
    end;
  else next_time=1E38;
  output;
run;



data want;
  set v_interval(rename=(time_n=onloadtime_n))  onloadtime(in=load);
  by subj onloadtime_n;
  if load then do;
    if onloadtime_n-_last_time<_next_time-onloadtime_n then
      time_n=_last_time;
    else 
      time_n=_next_time;
´   output;
    end;
  else do;
    _next_time=next_time;
    _last_time=onloadtime_n;
    _col=col;
    end;
  retain _:;
  keep subj time_n onloadtime_n trial _col;
  rename _col=col;
run;

Both tables have til be sorted by SUBJ and their time variable for this to work. The datastep view creates a table of time intervals using SET with POINT= for look-ahead. The values -1E38 and 1E38 are supposed to be "infinitely" lower resp. higher than any actual time values in the data.

 

Then it is just a matter of setting the view and the load-table in order, and assigning the right value to the TIME_N variable.

 

View solution in original post

2 REPLIES 2
s_lassen
Meteorite | Level 14

You can, of course, write an elegant solution in SQL, submit it, and wait a couple of days before getting the result.

 

An alternative may be a data step solution something like this (probably a lot faster):

data time;
input subj col   time_n;
cards;
2      1    1564623266038
2      2    1564623266044
2      3    1564623266058
2      4    1564623266073
2      5    1564623266090
2      6    1564623266106
;run;

data onloadtime;
  input subj trial onloadtime_n;
cards;
2      1    1564623265779
;run;

Data v_interval/view=v_interval;
  set time;
  by subj;
  if first.subj then do;
    next_time=time_n;
      time_n=-1E38;
      output;
      time_n=next_time;
      end;
  if not last.subj then do;
    _N_=_N_+1;
    set time(keep=time_n rename=(time_n=next_time)) point=_N_;
    end;
  else next_time=1E38;
  output;
run;



data want;
  set v_interval(rename=(time_n=onloadtime_n))  onloadtime(in=load);
  by subj onloadtime_n;
  if load then do;
    if onloadtime_n-_last_time<_next_time-onloadtime_n then
      time_n=_last_time;
    else 
      time_n=_next_time;
´   output;
    end;
  else do;
    _next_time=next_time;
    _last_time=onloadtime_n;
    _col=col;
    end;
  retain _:;
  keep subj time_n onloadtime_n trial _col;
  rename _col=col;
run;

Both tables have til be sorted by SUBJ and their time variable for this to work. The datastep view creates a table of time intervals using SET with POINT= for look-ahead. The values -1E38 and 1E38 are supposed to be "infinitely" lower resp. higher than any actual time values in the data.

 

Then it is just a matter of setting the view and the load-table in order, and assigning the right value to the TIME_N variable.

 

davidsmarch
Calcite | Level 5

That's fantastic. Super quick process, thanks for the solution! 

 

I'm trying to understand how it works and my ability there is limited. I follow about half of what's going on.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 596 views
  • 0 likes
  • 2 in conversation