Hello - I have a dataset of test dates for each record in sequential order, earliest to latest. Some records have data in up to 300 columns (i.e. test_date1...test_date300), some have only one. I need to add a new variable 'new_var' that indicates if there are two test_dates within a 6 month timespan.
Does anyone have suggestions of an array that would calculate the span of time between the first test_date and the last test_date and then count if there are two dates, if there are new_var='Y'.
Some sample data if useful.
data labs;
input ID $ test_date1 test_date2 test_date3 test_date4 test_date5;
datalines;
101 08192010 . . . . .
102 01192010 01192010 11272014 07262016 09272016
103 02192011 02192013 12272014 . .
;
run;
Thank you!
in you example you have ID 101 with only 1 date, but @lstockman wrote I could have 300 dates.
Will there be missing dates in the possible arrayed date-columns for an ID where date1 is not null but maybe date2-299 are null and then date300 has a date value?
is the first found date1-300 in the array the start-date and the last date found the records to be evaluated?
Can you please provide a few more datalines representing what you may have coming in.
Please post a sample of your wants related to the datalines you post.
Thank you.
this link may assist you with your wants
you might want to place a do loop in the code to advance the lag date if you want to move the starting date value from date1 to date2... date299.
replace the 5 with 300 if you have 300 columns and the 4 with your number of columns -1. When new_var{I} = 1 then the dates being compared are 6 months apart.
data labs;
input ID $ test_date1: mmddyy8. test_date2: mmddyy8. test_date3: mmddyy8. test_date4: mmddyy8. test_date5: mmddyy8.;
datalines;
101 08192010 . . . . .
102 01192010 01192010 11272014 07262016 09272016
103 02192011 02192013 12272014 . .
104 01012010 04012010 07012010 01052013 .
105 01012010 07102010 07012010 01052011 .
;
run;
data want;
array test_date(5);
array tmonth(5);
array new_var(5);
set labs;
do j = 1 to 4;
do i = j+1 to 5;
tmonth{i} = (test_date{i}-test_date{j})/182.5;
if tmonth{i} >= 1 then
do;
new_var{i} = 1;
end;
if i = 5 and test_date{j} ne . then output;
end;
end;
run;
Thank you. Unfortunately, the tmonth calculation is not working quite right. I would like the tmonth calculation to be the difference in months between each test_date and the previous test_date. Then if tmonth <=6 months then new_var=1. Does anyone know a modification? Also the code provided is outputting multiple rows for each ID.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.