Good afternoon.
I need a data step to determine the number of days between succeeding rows based on a group (see the example). I've tried to use RETAIN to achieve this, but I haven't been able to make any progress at all.
Could someone point me in the correct direction to perform this? I don't have an example since I haven't gotten anything to work.
Thank you very much to anyone willing to help.
| KEY | REVIEW_DATE | REVIEW_TYPE | Days | Comment |
| 380 | 3/31/2023 | INIT | 0 | All INIT with the same KEY begin with zero |
| 380 | 9/6/2023 | 6MONTH | 159 | Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE. |
| 380 | 3/6/2024 | 6MONTH | 182 | Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE. |
| 380 | 9/18/2024 | 6MONTH | 196 | Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE. |
| 380 | 4/4/2025 | 6MONTH | 198 | Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE. |
| 381 | 2/22/2024 | INIT | 0 | All INIT with the same KEY begin with zero |
| 381 | 6/30/2024 | 6MONTH | 129 | Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE. |
| 381 | 7/1/2025 | 6MONTH | 366 | Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE. |
Here's one way of doing this - using RETAIN requires use of a variable not being read in:
data Reviews;
input @1 key $3. @6 Review_Date mmddyy10.;
format Review_Date date9.;
datalines;
380 3/31/2023
380 9/6/2023
380 3/6/2024
380 9/18/2024
380 4/4/2025
381 2/22/2024
381 6/30/2024
381 7/1/2025
;
run;
data Reviews_Want;
keep key Review_Date Days;
set Reviews;
by key;
if first.key then Days = 0;
else Days = Review_Date - Last_Review_Date;
output;
retain Last_Review_Date;
Last_Review_Date = Review_Date;
run;
First let's convert your listing into an actual SAS dataset so we have something to program with.
data have;
infile cards dsd dlm='|' truncover ;
input KEY REVIEW_DATE :mmddyy. REVIEW_TYPE $ Days Comment :$100. ;
format review_date yymmdd10.;
cards;
380|3/31/2023|INIT|0|All INIT with the same KEY begin with zero
380|9/6/2023|6MONTH|159|Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE.
380|3/6/2024|6MONTH|182|Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE.
380|9/18/2024|6MONTH|196|Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE.
380|4/4/2025|6MONTH|198|Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE.
381|2/22/2024|INIT|0|All INIT with the same KEY begin with zero
381|6/30/2024|6MONTH|129|Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE.
381|7/1/2025|6MONTH|366|Subsequent rows with the same key determine the number of days from the preceding row REVIEW_DATE.
;
So it looks like from they values you show of DAYS that you don't actually want to RETAIN anything. Instead you just want to use the PREVIOUS value, which you can so with the LAG() function. Make not to execute LAG() conditionally, it can only return values that you previously passed it. Also make sure to remember to ignore/replace the values calculated for the FIRST observation in a group since the previous date value is from the previous group.
data want;
set have;
by key review_date;
new_days = review_date - lag(review_date);
if first.key then new_days=0;
run;
Results
You would want to use the RETAIN statement if your goal was instead to calculate the difference since the INIT record's date (usually called a BASELINE date).
data want ;
set have;
by key review_date ;
retain baseline ;
format baseline yymmdd10.;
if first.key then call missing(baseline);
if review_type='INIT' then baseline=review_date ;
new_days = review_date - baseline;
run;
Don't forget the DIF function:
new_days = review_date - lag(review_date);
is the same as
new_days = dif(review_date);
Same concern as LAG for conditional operation.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.