Apologies because this question has likely been answered, I'm just struggling to search for it in the correct way. I have n = approx 3,000 subjects with around 40 clinic visit dates in wide format, and then two new disease diagnosis dates (two different diseases) from a newly merged dataset.
ID date1 date2 date3 ... date40 diag_date1 diag_date2
1
2
.
.
What I need is to find a way to determine when diag_date1 and diag_date2 fit into the clinic visit date order, or to figure out between which two clinic visit dates the diagnosis dates occurred. Another problem is that date1 to date40 may not be chronological order. One idea I had is to transpose to long form and sort, but I don't know how to write code beyond that. Or is there a better way? I have SAS 9.4
Thank you so much in advance for your help!
Hello @pstuchli1 and welcome to the SAS Support Communities!
Try this:
/* Create test data */
data have;
call streaminit(27182818);
length subjid 8;
array date[40];
do subjid=1 to 3000;
do _n_=1 to dim(date);
date[_n_]=rand('integer',11000, 21000);
end;
diag_date1=rand('integer',10000, 22000);
diag_date2=rand('integer',10000, 22000);
output;
end;
format d: yymmdd10.;
run;
/* Determine between which visit dates diagnosis dates fall:
diag_date1 falls between dl1=date[il1] and du1=date[iu1].
diag_date2 falls between dl2=date[il2] and du2=date[iu2].
The lower (upper) limiting date is missing if the diagnosis date
is smaller (greater) than the minimum (maximum) visit date.
*/
data want;
set have;
array date[40];
array diag_date[2];
array il[2];
array iu[2];
array dl[2];
array du[2];
do _k=1 to dim(diag_date);
do _j=1 to n(of date [*]);
_d_s=smallest(_j, of date[*]);
_s=whichn(_d_s, of date[*]);
if _d_s<=diag_date[_k] then il[_k]=_s;
if _d_s>=diag_date[_k] then do;
iu[_k]=_s;
leave;
end;
end;
if il[_k] then dl[_k]=date[il[_k]];
if iu[_k] then du[_k]=date[iu[_k]];
end;
format d: yymmdd10.;
drop _:;
run;
Edit: Simplified the code. Checking for the extreme cases was redundant.
Edit 2: Just in case that you were using an older SAS 9.4 release (or even a SAS version <9.4) which doesn't accept rand('integer', a, b), you could replace these expressions by
floor((b-a+1)*rand('uniform')+a).
please post sample data in the form of a datastep with datalines that represents what you have and want you want.
include as many examples that show differences in the date problem you are trying to solve.
Thank you.
You did not show how should output look.
According to requirement: "...or to figure out between which two clinic visit dates the diagnosis dates occurred"
I suggest:
1) Define array of dates as: array dx{40} date1-date40;
2) Use function: call sortn(off dx(*)); to sort the dates in the array
3) Make a loop to locate dates: dx(i) le diag_date le dx(i+1) where i le 39.
Hello @pstuchli1 and welcome to the SAS Support Communities!
Try this:
/* Create test data */
data have;
call streaminit(27182818);
length subjid 8;
array date[40];
do subjid=1 to 3000;
do _n_=1 to dim(date);
date[_n_]=rand('integer',11000, 21000);
end;
diag_date1=rand('integer',10000, 22000);
diag_date2=rand('integer',10000, 22000);
output;
end;
format d: yymmdd10.;
run;
/* Determine between which visit dates diagnosis dates fall:
diag_date1 falls between dl1=date[il1] and du1=date[iu1].
diag_date2 falls between dl2=date[il2] and du2=date[iu2].
The lower (upper) limiting date is missing if the diagnosis date
is smaller (greater) than the minimum (maximum) visit date.
*/
data want;
set have;
array date[40];
array diag_date[2];
array il[2];
array iu[2];
array dl[2];
array du[2];
do _k=1 to dim(diag_date);
do _j=1 to n(of date [*]);
_d_s=smallest(_j, of date[*]);
_s=whichn(_d_s, of date[*]);
if _d_s<=diag_date[_k] then il[_k]=_s;
if _d_s>=diag_date[_k] then do;
iu[_k]=_s;
leave;
end;
end;
if il[_k] then dl[_k]=date[il[_k]];
if iu[_k] then du[_k]=date[iu[_k]];
end;
format d: yymmdd10.;
drop _:;
run;
Edit: Simplified the code. Checking for the extreme cases was redundant.
Edit 2: Just in case that you were using an older SAS 9.4 release (or even a SAS version <9.4) which doesn't accept rand('integer', a, b), you could replace these expressions by
floor((b-a+1)*rand('uniform')+a).
Thank you so much to @Shmuel and @FreelanceReinh, brilliant work, these are fabulous and solved my problem! Thank you so so much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.