BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
davidsmarch
Calcite | Level 5
I saw some previous solutions that don't quite fit, so I though I'd try to see if I could get some help. 

 

I have two datasets. One dataset (time.dat) is very long (millions of rows) and contains three variables: subject, column, and time_n in this format:

 

subj col   time_n
2      1    1564623266038
2      2    1564623266044
2      3    1564623266058
2      4    1564623266073
2      5    1564623266090
2      6    1564623266106

.

.

.

 

Each subject has 1000's of columns that each have a unique time_n. There are also many subjects.

 

The second set (onloadtime.dat) has all of the same subjects and three variables, subject, trial, and onloadtime_n in this format:

 

subj trial onloadtime_n
2      1    1564623265779
2      2    1564623272356
2      3    1564623275373
2      4    1564623278728

2      5    1564623282137

 

Each subject only has 180 onloadtime's. What I need to do is find the closest time_n to each onloadtime_n and to indicate which trial corresponds to that time. Such that the resultant data looks like:

 

subj col    time_n                 onloadtime_n       trial
2      1   1564623266038    1564623265779     1  
2      2   1564623266044
2      3   1564623266058
2      4   1564623266073
2      5   1564623266090
2      6   1564623266106

.

.

.

2    450  1564623272324    1564623272356   2

 

And to repeat this per subject. Essentially I need to know what trial in the shorter set (onloadtime) corresponds to what closest column in the long set (time), and to the indicate what time trial that actually is. I hope that makes sense. I'd love any support and would be happy to answer any questions. 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

You can, of course, write an elegant solution in SQL, submit it, and wait a couple of days before getting the result.

 

An alternative may be a data step solution something like this (probably a lot faster):

data time;
input subj col   time_n;
cards;
2      1    1564623266038
2      2    1564623266044
2      3    1564623266058
2      4    1564623266073
2      5    1564623266090
2      6    1564623266106
;run;

data onloadtime;
  input subj trial onloadtime_n;
cards;
2      1    1564623265779
;run;

Data v_interval/view=v_interval;
  set time;
  by subj;
  if first.subj then do;
    next_time=time_n;
      time_n=-1E38;
      output;
      time_n=next_time;
      end;
  if not last.subj then do;
    _N_=_N_+1;
    set time(keep=time_n rename=(time_n=next_time)) point=_N_;
    end;
  else next_time=1E38;
  output;
run;



data want;
  set v_interval(rename=(time_n=onloadtime_n))  onloadtime(in=load);
  by subj onloadtime_n;
  if load then do;
    if onloadtime_n-_last_time<_next_time-onloadtime_n then
      time_n=_last_time;
    else 
      time_n=_next_time;
´   output;
    end;
  else do;
    _next_time=next_time;
    _last_time=onloadtime_n;
    _col=col;
    end;
  retain _:;
  keep subj time_n onloadtime_n trial _col;
  rename _col=col;
run;

Both tables have til be sorted by SUBJ and their time variable for this to work. The datastep view creates a table of time intervals using SET with POINT= for look-ahead. The values -1E38 and 1E38 are supposed to be "infinitely" lower resp. higher than any actual time values in the data.

 

Then it is just a matter of setting the view and the load-table in order, and assigning the right value to the TIME_N variable.

 

View solution in original post

2 REPLIES 2
s_lassen
Meteorite | Level 14

You can, of course, write an elegant solution in SQL, submit it, and wait a couple of days before getting the result.

 

An alternative may be a data step solution something like this (probably a lot faster):

data time;
input subj col   time_n;
cards;
2      1    1564623266038
2      2    1564623266044
2      3    1564623266058
2      4    1564623266073
2      5    1564623266090
2      6    1564623266106
;run;

data onloadtime;
  input subj trial onloadtime_n;
cards;
2      1    1564623265779
;run;

Data v_interval/view=v_interval;
  set time;
  by subj;
  if first.subj then do;
    next_time=time_n;
      time_n=-1E38;
      output;
      time_n=next_time;
      end;
  if not last.subj then do;
    _N_=_N_+1;
    set time(keep=time_n rename=(time_n=next_time)) point=_N_;
    end;
  else next_time=1E38;
  output;
run;



data want;
  set v_interval(rename=(time_n=onloadtime_n))  onloadtime(in=load);
  by subj onloadtime_n;
  if load then do;
    if onloadtime_n-_last_time<_next_time-onloadtime_n then
      time_n=_last_time;
    else 
      time_n=_next_time;
´   output;
    end;
  else do;
    _next_time=next_time;
    _last_time=onloadtime_n;
    _col=col;
    end;
  retain _:;
  keep subj time_n onloadtime_n trial _col;
  rename _col=col;
run;

Both tables have til be sorted by SUBJ and their time variable for this to work. The datastep view creates a table of time intervals using SET with POINT= for look-ahead. The values -1E38 and 1E38 are supposed to be "infinitely" lower resp. higher than any actual time values in the data.

 

Then it is just a matter of setting the view and the load-table in order, and assigning the right value to the TIME_N variable.

 

davidsmarch
Calcite | Level 5

That's fantastic. Super quick process, thanks for the solution! 

 

I'm trying to understand how it works and my ability there is limited. I follow about half of what's going on.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 408 views
  • 0 likes
  • 2 in conversation