Hello - I have a dataset of test dates for each record in sequential order, earliest to latest. Some records have data in up to 300 columns (i.e. test_date1...test_date300), some have only one. I need to add a new variable 'new_var' that indicates if there are two test_dates within a 6 month timespan.
Does anyone have suggestions of an array that would calculate the span of time between the first test_date and the last test_date and then count if there are two dates, if there are new_var='Y'.
Some sample data if useful.
data labs;
input ID $ test_date1 test_date2 test_date3 test_date4 test_date5;
datalines;
101 08192010 . . . . .
102 01192010 01192010 11272014 07262016 09272016
103 02192011 02192013 12272014 . .
;
run;
Thank you!
in you example you have ID 101 with only 1 date, but @lstockman wrote I could have 300 dates.
Will there be missing dates in the possible arrayed date-columns for an ID where date1 is not null but maybe date2-299 are null and then date300 has a date value?
is the first found date1-300 in the array the start-date and the last date found the records to be evaluated?
Can you please provide a few more datalines representing what you may have coming in.
Please post a sample of your wants related to the datalines you post.
Thank you.
this link may assist you with your wants
you might want to place a do loop in the code to advance the lag date if you want to move the starting date value from date1 to date2... date299.
replace the 5 with 300 if you have 300 columns and the 4 with your number of columns -1. When new_var{I} = 1 then the dates being compared are 6 months apart.
data labs;
input ID $ test_date1: mmddyy8. test_date2: mmddyy8. test_date3: mmddyy8. test_date4: mmddyy8. test_date5: mmddyy8.;
datalines;
101 08192010 . . . . .
102 01192010 01192010 11272014 07262016 09272016
103 02192011 02192013 12272014 . .
104 01012010 04012010 07012010 01052013 .
105 01012010 07102010 07012010 01052011 .
;
run;
data want;
array test_date(5);
array tmonth(5);
array new_var(5);
set labs;
do j = 1 to 4;
do i = j+1 to 5;
tmonth{i} = (test_date{i}-test_date{j})/182.5;
if tmonth{i} >= 1 then
do;
new_var{i} = 1;
end;
if i = 5 and test_date{j} ne . then output;
end;
end;
run;
Thank you. Unfortunately, the tmonth calculation is not working quite right. I would like the tmonth calculation to be the difference in months between each test_date and the previous test_date. Then if tmonth <=6 months then new_var=1. Does anyone know a modification? Also the code provided is outputting multiple rows for each ID.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.