Hi guys,
suppose to have the following dataset:
data DB;
input ID :$20. Date :$20. ReferenceDate :$20. Outcome :$20.;
cards;
0001 01/09/2024 01/06/2024 Response
0001 03/09/2024 01/06/2024 Response
0001 09/13/2024 01/06/2024 Response
0002 11/09/2016 06/30/2016 Complete Response
0002 09/20/2016 06/30/2016 Relapse
0003 06/30/2025 04/15/2025 Stable Disease
0003 07/31/2025 04/15/2025 Relapse
0004 03/12/2024 12/13/2023 Complete Response
run;
/*Assign the correct format to dates */
data DB1;
set DB;
numeric_date1 = input(Date, mmddyy10.);
numeric_date2 = input(ReferenceDate, mmddyy10.);
drop Date ReferenceDate;
rename numeric_date1 = Date numeric_date2 = ReferenceDate;
format numeric_date1 numeric_date2 date9.; run;
Rule 1:
I have to flag the date ("Date" variable) in the interval between 2 and 4 months after the ReferenceDate. This is okay for me. I know how to program this.
Rule 2:
Then, if more than one date falls into Rule 1, I have to flag (with another flag) the one that is closest to the Rule 1, i.e., the time point of 3 months.
The ReferenceDate is always the same for each subject.
Can anyone help me to do this?
Thank you in advance.
Best
Thank you. Here is my solution, there are probably other ways to do this, I wouldn't be surprised if someone else comes up with a shorter solution.
/* Create first flag */
data db2;
set db1;
flag1 = (intnx('month',referencedate,2,'s') <= date <= intnx('month',referencedate,4,'s'));
run;
/* Find IDs with more than one flag */
proc summary data=db2 nway;
class id;
var flag1;
output out=_nflags_ sum=nflags;
run;
/* If more than one flag, determine distance to refdate+3 months */
data db3;
merge db2 _nflags_(drop=_:);
by id;
if nflags>=2 then dist_betw_refdate_plus3months=abs(date-intnx('month',referencedate,3,'s'));
run;
/* Find minimum distance to refdate+3 months and select that one */
proc sort data=db3;
by id dist_betw_refdate_plus3months;
run;
data want;
set db3;
by id dist_betw_refdate_plus3months;
if (flag1=1 and nflags=1) or (flag1=1 and nflags>=2 and first.id);
run;
PS: In the future, don't create date values as character. Create them as numeric and save yourself and all of us here a step.
Please do us a favor. First, its great that you provided data with SAS data step code. But you have provided us with data where the dates are character strings and not valid SAS numeric variable. We can't work with character string dates. The code also returns errors. I request that you fix the code so that there are no errors, and so that we have numeric variables that contain dates. I'm sure you know how to do that, so please do that for us.
Your code to create data set DB still produces errors.
Thank you. Here is my solution, there are probably other ways to do this, I wouldn't be surprised if someone else comes up with a shorter solution.
/* Create first flag */
data db2;
set db1;
flag1 = (intnx('month',referencedate,2,'s') <= date <= intnx('month',referencedate,4,'s'));
run;
/* Find IDs with more than one flag */
proc summary data=db2 nway;
class id;
var flag1;
output out=_nflags_ sum=nflags;
run;
/* If more than one flag, determine distance to refdate+3 months */
data db3;
merge db2 _nflags_(drop=_:);
by id;
if nflags>=2 then dist_betw_refdate_plus3months=abs(date-intnx('month',referencedate,3,'s'));
run;
/* Find minimum distance to refdate+3 months and select that one */
proc sort data=db3;
by id dist_betw_refdate_plus3months;
run;
data want;
set db3;
by id dist_betw_refdate_plus3months;
if (flag1=1 and nflags=1) or (flag1=1 and nflags>=2 and first.id);
run;
PS: In the future, don't create date values as character. Create them as numeric and save yourself and all of us here a step.
What do you want to do when there are two dates that are the same distance, but on different sides? Would you like to take the one that is BEFORE the 3 months post reference date or AFTER?
You need to decide how you want to count "months". Is it important to use actual calendar months? If so then the number of days will vary because the length of calendar months varies. It is usually only required to use calendar months when doing legal applications. This looks like a Medical application so it is better to just use a constant number of days so you don't need to worry about the varying length of calendar months. For a small number of months like 2 to 4 it is probably simplest to just use 30 day intervals as your proxy for "month".
First let's fix your example data step so it makes the DB dataset directly.
data DB;
input ID :$20. (Date ReferenceDate) (:mmddyy.) Outcome :$20.;
format Date ReferenceDate date9.;
cards;
0001 01/09/2024 01/06/2024 Response
0001 03/09/2024 01/06/2024 Response
0001 09/13/2024 01/06/2024 Response
0002 11/09/2016 06/30/2016 Complete Response
0002 09/20/2016 06/30/2016 Relapse
0003 06/30/2025 04/15/2025 Stable Disease
0003 07/31/2025 04/15/2025 Relapse
0004 03/12/2024 12/13/2023 Complete Response
;
So you can create FLAG1 directly reading the data as it is. But to make FLAG2 you will first need to find the difference between DATE and the 3 month target date so you can discover which one is closest.
You could do the calculation and ordering in one step by using PROC SQL.
proc sql;
create table step1 as
select *
, (date between referencedate+60 and referencedate+120) as Flag1
, case when calculated Flag1 then abs(referencedate+90-date)
else .
end as diff3month
from db
order by id, diff3month desc
;
quit;
Then you can use another data step to flag the closest date.
data want;
set step1;
by id ;
Flag2 = first.id and Flag1;
run;
Results
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.