Solved: Re: find data point from one table based on date, and pull to another ...

linlin87 · Posted 06-07-2022 12:02 PM

Hi SAS forum,

I have two data tables that look like this:

data file1;
input date $ weight;
datalines;
07FEB20:08:10:08 68.9
07FEB20:09:10:08 68.2
07FEB20:10:10:08 68.4
07FEB20:11:10:08 68.7
07FEB20:12:10:08 68.1
07FEB20:13:10:08 67.4
07FEB20:14:10:08 67.9
07FEB20:15:10:08 80.2
07FEB20:16:10:08 81.2;
run;

data file2; 
input date $; 
datalines; 
07FEB20:10:40:08
10FEB20:16:45:08
14FEB20:12:20:08
20FEB20:11:54:08
02JAN20:12:58:08;
run;

As you can see, file1 has hourly values of weight, whereas file2 just has particular date-times. What I want to do is, for each row (date-time) in file2, look up the first date-time in file1 that precedes this date-time, and pull this date-time and the associated weight into file2. So, for this example I would want file2 to be modified to be file2_new where the first row of data would read as:

data file2_new; 
input date $ date_pulled $ weight_pulled; 
datalines; 
07FEB20:10:40:08 07FEB20:10:10:08 68.4
run;

Any tips on how to do this would be really appreciated!

Thanks
linlin87

Tom · Posted 06-07-2022 01:44 PM

A simple way is to interleave the observations and retain the WEIGHT value in a new variable.

First let's make some more complex test data.

data file1;
  input id date :datetime. weight;
  format date datetime19.;
datalines;
1  07FEB2020:08:10:08 68.9
1  07FEB2020:09:10:08 68.2
1  10FEB2020:10:10:08 68.4
1  13FEB2020:11:10:08 68.7
1  14FEB2020:12:10:08 68.1
1  16FEB2020:13:10:08 67.4
1  18FEB2020:14:10:08 67.9
2  21FEB2020:15:10:08 80.2
2  07MAR2020:16:10:08 81.2
;

data file2; 
  input id date :datetime. ;
  format date datetime19.;
datalines; 
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
2  20FEB2020:11:54:08
2  02MAR2020:12:58:08
;

So now just use a BY statement with SET to interleave the records. Use the IN= dataset option to figure out which observations are from FILE2.

data want;
  set file1 file2(in=in2);
  by id date;
  if first.id then last_wt=.;
  last_wt = coalesce(weight, last_wt);
  retain last_wt;
  if in2;
  drop weight;
run;

Results

Obs    id                   date    last_wt

 1      1     07FEB2020:10:40:08      68.2
 2      1     10FEB2020:16:45:08      68.4
 3      1     14FEB2020:12:20:08      68.1
 4      2     20FEB2020:11:54:08        .
 5      2     02MAR2020:12:58:08      80.2

View solution in original post

ballardw · Posted 06-07-2022 12:55 PM

First step:

Do not use character values for datetimes. With character values such as you show 14Feb would come before 20Jan.

Second: a warning bad things can happen to folks that habitually use 2-digit years.

Third: define in a bit more detail what "look up the first date-time in file1 that precedes this date-time" means. It would appear to me that except for the Jan date that 07FEB20:08:10:08 is the "first date that precedes" the values in file_2. There is something missing about your definition of "first". You really should supply more than one result example. That we can check whether any suggestion is correct.

Fourth: make sure that the semicolon that ends datalines is on it's own line.

data file1;
input date :datetime. weight;
format date datetime19.;
datalines;
07FEB20:08:10:08 68.9
07FEB20:09:10:08 68.2
07FEB20:10:10:08 68.4
07FEB20:11:10:08 68.7
07FEB20:12:10:08 68.1
07FEB20:13:10:08 67.4
07FEB20:14:10:08 67.9
07FEB20:15:10:08 80.2
07FEB20:16:10:08 81.2
;
run;

data file2; 
input date :datetime. ;
format date datetime19.;
datalines; 
07FEB20:10:40:08
10FEB20:16:45:08
14FEB20:12:20:08
20FEB20:11:54:08
02JAN20:12:58:08
;
run;

If your values are character you will need to convert them to numeric values similar to above so any comparisons resolve correctly.

u60655191 · Posted 06-07-2022 03:24 PM

Apologies, ballardw. You are correct in that I incorrectly wrote down this problem.

The problem should have been: for each date-time in file2, find the LAST date-time and weight that precedes it. Saying "find the first that precedes it" is a bit ambiguous.

I also did have the date-time variable formatted as Tom states below (not as a character variable as I originally posted).

Also, thank you for correcting the semicolon on datalines.

How would you tackle this problem? I would be very grateful for your input!

Kind regards,
linlin87

Tom · Posted 06-07-2022 01:44 PM

A simple way is to interleave the observations and retain the WEIGHT value in a new variable.

First let's make some more complex test data.

data file1;
  input id date :datetime. weight;
  format date datetime19.;
datalines;
1  07FEB2020:08:10:08 68.9
1  07FEB2020:09:10:08 68.2
1  10FEB2020:10:10:08 68.4
1  13FEB2020:11:10:08 68.7
1  14FEB2020:12:10:08 68.1
1  16FEB2020:13:10:08 67.4
1  18FEB2020:14:10:08 67.9
2  21FEB2020:15:10:08 80.2
2  07MAR2020:16:10:08 81.2
;

data file2; 
  input id date :datetime. ;
  format date datetime19.;
datalines; 
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
2  20FEB2020:11:54:08
2  02MAR2020:12:58:08
;

So now just use a BY statement with SET to interleave the records. Use the IN= dataset option to figure out which observations are from FILE2.

data want;
  set file1 file2(in=in2);
  by id date;
  if first.id then last_wt=.;
  last_wt = coalesce(weight, last_wt);
  retain last_wt;
  if in2;
  drop weight;
run;

Results

Obs    id                   date    last_wt

 1      1     07FEB2020:10:40:08      68.2
 2      1     10FEB2020:16:45:08      68.4
 3      1     14FEB2020:12:20:08      68.1
 4      2     20FEB2020:11:54:08        .
 5      2     02MAR2020:12:58:08      80.2

u60655191 · Posted 06-07-2022 03:21 PM

Thank you Tom, that is brilliant. It certainly looks to be doing what I was hoping. But why have you introduced this id variable? Because that is not how my data is structured. I do not have this ID variable in my dataset, and I am not sure how this variable you have generated relates to the original data structures.

Tom · Posted 06-07-2022 03:26 PM

The extra ID variable was to demonstrate what to do when you have data from multiple people/places/things/sample. In those cases you do not want the last WEIGHT value to be taken from a different person.

If you don't have multiple sampling units it will be very difficult to do any statistical analysis.

u60655191 · Posted 06-07-2022 03:29 PM

I see, thanks Tom. That is indeed very useful and needed.

How would I pull over the datetime from file1, as well as the weight?

Tom · Posted 06-07-2022 03:33 PM

Just extend the process to include that variable.

You probably want to only consider the dates where there is actually a WEIGHT value.

data want;
  set file1 file2(in=in2);
  by id date;
  if first.id then call missing(last_wt,last_dt);
  if not missing(weight) then do;
     last_wt = coalesce(weight, last_wt);
     last_dt = date;
  end;
  retain last_wt last_dt;
  format last_dt datetime19. ;
  if in2;
  drop weight;
run;

u60655191 · Posted 06-07-2022 03:43 PM

Thank you Tom I really appreciate your help. I want to accept this, but I still don't understand how it works. What does in=in2 do? What is the argument in2? What is the coalesce command doing?

Tom · Posted 06-07-2022 04:00 PM

Read about dataset options. IN= allows you to provide the NAME of a variable that will be TRUE when that dataset contributed to the current observation. So because of the subsetting IF statement only the observations read from TABLE2 will be make to the output dataset.

Read about functions. COALESCE() just means take the first non missing value in the argument list. So if WEIGTH is not empty it replaces LAST_WT. With the extra IF NOT MISSING(weight) test in the last code I posted it is no longer needed.

linlin87 · Posted 06-08-2022 04:04 AM

Hi Tom,

Thanks for this. It is working but not quite how I expect, because of one difference in the datafile to how I described in my original post. The files are as follows:

data file1;
  input id date :datetime. weight;
  format date datetime19.;
datalines;
1  07FEB2020:08:10:08 68.9
1  07FEB2020:09:10:08 68.2
1  10FEB2020:10:10:08 68.4
1  13FEB2020:11:10:08 68.7
1  14FEB2020:12:10:08 68.1
1  16FEB2020:13:10:08 67.4
1  18FEB2020:14:10:08 67.9
2  21FEB2020:15:10:08 80.2
2  08MAR2020:16:10:08 81.2
2  07FEB2020:08:10:08 68.9
2  07FEB2020:09:10:08 68.2
2  10FEB2020:10:10:08 68.4
2  13FEB2020:09:10:08 68.7
3  14FEB2020:12:10:08 68.1
3  16FEB2020:10:10:08 67.4
3  18FEB2020:14:10:08 67.9
3  21FEB2020:11:10:08 80.2
3  10FEB2020:23:10:08 82.9
;
run;

data file2; 
  input id refdate :datetime. ;
  format date datetime19.;
datalines; 
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
1  20FEB2020:11:54:08
1  02MAR2020:12:58:08
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
2  20FEB2020:11:54:08
2  02MAR2020:12:58:08
2  07FEB2020:13:40:08
2  10FEB2020:17:40:08
2  14FEB2020:12:20:08
2  20FEB2020:09:54:08
2  02MAR2020:12:08:08
3  07FEB2020:10:40:08
3  10FEB2020:14:40:08
3  14FEB2020:12:20:08
3  20FEB2020:19:54:08
3  02MAR2020:12:58:08
3  07FEB2020:11:00:08
3  10FEB2020:16:45:08
;
run;

Now obviously the code doesn't work because the variable name in file2 is refdtm not dtm. How would I make your code work for this? I could just change the name of refdtm to dtm, but wondering if you have a better fix.

Kind regards,
linlin87

Tom · Posted 06-08-2022 08:32 AM

Did you read the documentation on dataset options I sent the link for? For this you need the RENAME= dataset option.

merge table1(rename=(date=refdate)) .....

Make sure to modify the references to DATE to use new REFDATE name instead in the rest of the data step.

PS Using DATE for a variable that does not actually contain DATE (number of days) values will cause you confusion later on. Your variables contain DATETIME (number of seconds) values instead of DATE values.

find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Re: find data point from one table based on date, and pull to another table

Registration is open