BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
linlin87
Quartz | Level 8

Hi SAS forum,

 

I have two data tables that look like this:

 

 

data file1;
input date $ weight;
datalines;
07FEB20:08:10:08 68.9
07FEB20:09:10:08 68.2
07FEB20:10:10:08 68.4
07FEB20:11:10:08 68.7
07FEB20:12:10:08 68.1
07FEB20:13:10:08 67.4
07FEB20:14:10:08 67.9
07FEB20:15:10:08 80.2
07FEB20:16:10:08 81.2;
run;

data file2;
input date $;
datalines;
07FEB20:10:40:08
10FEB20:16:45:08
14FEB20:12:20:08
20FEB20:11:54:08
02JAN20:12:58:08;
run;

As you can see, file1 has hourly values of weight, whereas file2 just has particular date-times. What I want to do is, for each row (date-time) in file2, look up the first date-time in file1 that precedes this date-time, and pull this date-time and the associated weight into file2. So, for this example I would want file2 to be modified to be file2_new where the first row of data would read as:

data file2_new; 
input date $ date_pulled $ weight_pulled; 
datalines; 
07FEB20:10:40:08 07FEB20:10:10:08 68.4
run; 

Any tips on how to do this would be really appreciated!

Thanks
linlin87

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

A simple way is to interleave the observations and retain the WEIGHT value in a new variable.

First let's make some more complex test data.

data file1;
  input id date :datetime. weight;
  format date datetime19.;
datalines;
1  07FEB2020:08:10:08 68.9
1  07FEB2020:09:10:08 68.2
1  10FEB2020:10:10:08 68.4
1  13FEB2020:11:10:08 68.7
1  14FEB2020:12:10:08 68.1
1  16FEB2020:13:10:08 67.4
1  18FEB2020:14:10:08 67.9
2  21FEB2020:15:10:08 80.2
2  07MAR2020:16:10:08 81.2
;

data file2; 
  input id date :datetime. ;
  format date datetime19.;
datalines; 
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
2  20FEB2020:11:54:08
2  02MAR2020:12:58:08
;

So now just use a BY statement with SET to interleave the records.  Use the IN= dataset option to figure out which observations are from FILE2.

data want;
  set file1 file2(in=in2);
  by id date;
  if first.id then last_wt=.;
  last_wt = coalesce(weight, last_wt);
  retain last_wt;
  if in2;
  drop weight;
run;

Results

Obs    id                   date    last_wt

 1      1     07FEB2020:10:40:08      68.2
 2      1     10FEB2020:16:45:08      68.4
 3      1     14FEB2020:12:20:08      68.1
 4      2     20FEB2020:11:54:08        .
 5      2     02MAR2020:12:58:08      80.2

View solution in original post

11 REPLIES 11
ballardw
Super User

First step:

Do not use character values for datetimes. With character values such as you show 14Feb would come before 20Jan.

 

Second: a warning bad things can happen to folks that habitually use 2-digit years.

 

Third: define in a bit more detail what "look up the first date-time in file1 that precedes this date-time" means. It would appear to me that except for the Jan date that 07FEB20:08:10:08 is the "first date that precedes" the values in file_2. There is something missing about your definition of "first". You really should supply more than one result example. That we can check whether any suggestion is correct.

 

Fourth: make sure that the semicolon that ends datalines is on it's own line.

data file1;
input date :datetime. weight;
format date datetime19.;
datalines;
07FEB20:08:10:08 68.9
07FEB20:09:10:08 68.2
07FEB20:10:10:08 68.4
07FEB20:11:10:08 68.7
07FEB20:12:10:08 68.1
07FEB20:13:10:08 67.4
07FEB20:14:10:08 67.9
07FEB20:15:10:08 80.2
07FEB20:16:10:08 81.2
;
run;

data file2; 
input date :datetime. ;
format date datetime19.;
datalines; 
07FEB20:10:40:08
10FEB20:16:45:08
14FEB20:12:20:08
20FEB20:11:54:08
02JAN20:12:58:08
;
run;  

If your values are character you will need to convert them to numeric values similar to above so any comparisons resolve correctly.

 

u60655191
Fluorite | Level 6

Apologies, ballardw. You are correct in that I incorrectly wrote down this problem.

The problem should have been: for each date-time in file2, find the LAST date-time and weight that precedes it. Saying "find the first that precedes it" is a bit ambiguous.

 

I also did have the date-time variable formatted as Tom states below (not as a character variable as I originally posted).

 

Also, thank you for correcting the semicolon on datalines.

 

How would you tackle this problem? I would be very grateful for your input! 

Kind regards,
linlin87

Tom
Super User Tom
Super User

A simple way is to interleave the observations and retain the WEIGHT value in a new variable.

First let's make some more complex test data.

data file1;
  input id date :datetime. weight;
  format date datetime19.;
datalines;
1  07FEB2020:08:10:08 68.9
1  07FEB2020:09:10:08 68.2
1  10FEB2020:10:10:08 68.4
1  13FEB2020:11:10:08 68.7
1  14FEB2020:12:10:08 68.1
1  16FEB2020:13:10:08 67.4
1  18FEB2020:14:10:08 67.9
2  21FEB2020:15:10:08 80.2
2  07MAR2020:16:10:08 81.2
;

data file2; 
  input id date :datetime. ;
  format date datetime19.;
datalines; 
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
2  20FEB2020:11:54:08
2  02MAR2020:12:58:08
;

So now just use a BY statement with SET to interleave the records.  Use the IN= dataset option to figure out which observations are from FILE2.

data want;
  set file1 file2(in=in2);
  by id date;
  if first.id then last_wt=.;
  last_wt = coalesce(weight, last_wt);
  retain last_wt;
  if in2;
  drop weight;
run;

Results

Obs    id                   date    last_wt

 1      1     07FEB2020:10:40:08      68.2
 2      1     10FEB2020:16:45:08      68.4
 3      1     14FEB2020:12:20:08      68.1
 4      2     20FEB2020:11:54:08        .
 5      2     02MAR2020:12:58:08      80.2

u60655191
Fluorite | Level 6

Thank you Tom, that is brilliant. It certainly looks to be doing what I was hoping. But why have you introduced this id variable? Because that is not how my data is structured. I do not have this ID variable in my dataset, and I am not sure how this variable you have generated relates to the original data structures.

Tom
Super User Tom
Super User

The extra ID variable was to demonstrate what to do when you have data from multiple people/places/things/sample.  In those cases you do not want the last WEIGHT value to be taken from a different person.

 

If you don't have multiple sampling units it will be very difficult to do any statistical analysis.

u60655191
Fluorite | Level 6

I see, thanks Tom. That is indeed very useful and needed.

 

How would I pull over the datetime from file1, as well as the weight?

Tom
Super User Tom
Super User

Just extend the process to include that variable.

You probably want to only consider the dates where there is actually a WEIGHT value.

data want;
  set file1 file2(in=in2);
  by id date;
  if first.id then call missing(last_wt,last_dt);
  if not missing(weight) then do;
     last_wt = coalesce(weight, last_wt);
     last_dt = date;
  end;
  retain last_wt last_dt;
  format last_dt datetime19. ;
  if in2;
  drop weight;
run;
u60655191
Fluorite | Level 6

Thank you Tom I really appreciate your help. I want to accept this, but I still don't understand how it works. What does in=in2 do? What is the argument in2? What is the coalesce command doing?

Tom
Super User Tom
Super User

Read about dataset options. IN= allows you to provide the NAME of a variable that will be TRUE when that dataset contributed to the current observation.  So because of the subsetting IF statement only the observations read from TABLE2 will be make to the output dataset.

Read about functions. COALESCE() just means take the first non missing value in the argument list.  So if WEIGTH is not empty it replaces LAST_WT.  With the extra IF NOT MISSING(weight) test in the last code I posted it is no longer needed.

linlin87
Quartz | Level 8

Hi Tom,

 

Thanks for this. It is working but not quite how I expect, because of one difference in the datafile to how I described in my original post. The files are as follows:

 

 

data file1;
  input id date :datetime. weight;
  format date datetime19.;
datalines;
1  07FEB2020:08:10:08 68.9
1  07FEB2020:09:10:08 68.2
1  10FEB2020:10:10:08 68.4
1  13FEB2020:11:10:08 68.7
1  14FEB2020:12:10:08 68.1
1  16FEB2020:13:10:08 67.4
1  18FEB2020:14:10:08 67.9
2  21FEB2020:15:10:08 80.2
2  08MAR2020:16:10:08 81.2
2  07FEB2020:08:10:08 68.9
2  07FEB2020:09:10:08 68.2
2  10FEB2020:10:10:08 68.4
2  13FEB2020:09:10:08 68.7
3  14FEB2020:12:10:08 68.1
3  16FEB2020:10:10:08 67.4
3  18FEB2020:14:10:08 67.9
3  21FEB2020:11:10:08 80.2
3  10FEB2020:23:10:08 82.9
;
run;

data file2; 
  input id refdate :datetime. ;
  format date datetime19.;
datalines; 
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
1  20FEB2020:11:54:08
1  02MAR2020:12:58:08
1  07FEB2020:10:40:08
1  10FEB2020:16:45:08
1  14FEB2020:12:20:08
2  20FEB2020:11:54:08
2  02MAR2020:12:58:08
2  07FEB2020:13:40:08
2  10FEB2020:17:40:08
2  14FEB2020:12:20:08
2  20FEB2020:09:54:08
2  02MAR2020:12:08:08
3  07FEB2020:10:40:08
3  10FEB2020:14:40:08
3  14FEB2020:12:20:08
3  20FEB2020:19:54:08
3  02MAR2020:12:58:08
3  07FEB2020:11:00:08
3  10FEB2020:16:45:08
;
run;

Now obviously the code doesn't work because the variable name in file2 is refdtm not dtm. How would I make your code work for this? I could just change the name of refdtm to dtm, but wondering if you have a better fix.

 

Kind regards,
linlin87

 

Tom
Super User Tom
Super User

Did you read the documentation on dataset options I sent the link for?  For this you need the RENAME= dataset option.

merge table1(rename=(date=refdate)) .....

Make sure to modify the references to DATE to use new REFDATE name instead in the rest of the data step.

 

PS Using DATE for a variable that does not actually contain DATE (number of days) values will cause you confusion later on.  Your variables contain DATETIME (number of seconds) values instead of DATE values.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1553 views
  • 5 likes
  • 4 in conversation