Hi there,
I have to two data sets look like below, and the patient ID is the unique key linking those two sets. I want a third dataset, which give me the patient ID, the minimun of treatment date which is 6 month prior to the index_date, the minimun of the treatment date which is within 6-month before and 6-month after the index_date, and the minumun of the treatment date which is 6-month after the index_date. If any type of minimun of treatment date is not existing, for instance, there is no treatment date 6-month prior to the index_date, then this value will be misisng. How can I write the query? Thank you very much!
patient ID | index_date |
1 | 3/4/2009 |
2 | 8/16/2010 |
3 | 6/6/2011 |
4 | 7/4/2009 |
… | … |
patient ID | treatment_data |
1 | 9/30/2008 |
1 | 10/11/2008 |
1 | 4/1/2009 |
2 | 1/1/2010 |
2 | 9/30/2010 |
3 | 12/30/2009 |
3 | 5/1/2010 |
3 | 8/1/2011 |
3 | 10/13/2011 |
3 | 12/13/2011 |
4 | 6/8/2009 |
4 | 3/1/2010 |
… | … |
I think given the two desired outcomes, I'd change the approach to the following.
data indexes;
format index_date date9.;
input patient index_date :mmddyy10.;
index_low = intnx('month', index_date, -6);
index_high = intnx('month', index_date, 6);
datalines;
1 3/4/2009
2 8/16/2010
3 6/6/2011
4 7/4/2009
;
data patients;
format treatment_date date9.;
input patient treatment_date :mmddyy10.;
datalines;
1 9/30/2008
1 10/11/2008
1 4/1/2009
2 1/1/2010
2 9/30/2010
3 12/30/2009
3 5/1/2010
3 8/1/2011
3 10/13/2011
3 12/13/2011
4 6/8/2009
4 3/1/2010
;
proc sql;
CREATE TABLE allData AS
SELECT a.*,
(a.treatment_date < b.index_low) AS low_flag,
(a.treatment_date BETWEEN b.index_low AND b.index_high) AS mid_flag,
(a.treatment_date > b.index_high) AS high_flag,
CASE CALCULATED low_flag WHEN 1 THEN treatment_date ELSE . END AS low_date format date9.,
CASE CALCULATED mid_flag WHEN 1 THEN treatment_date ELSE . END AS mid_date format date9.,
CASE CALCULATED high_flag WHEN 1 THEN treatment_date ELSE . END AS high_date format date9.
FROM patients AS a
LEFT JOIN indexes AS b
ON a.patient = b.patient;
quit;
proc summary data = allData nway;
class patient;
var low_flag mid_flag high_flag;
output out = want_flags max = ;
run;
proc summary data = allData nway;
class patient;
var low_date mid_date high_date;
output out = want_dates min(low_date mid_date)= max(high_date) = ;
run;
I have not fully QCd the output since you don't show your desired results, but I think this is doing what you want based on my interpretation of your description.
data indexes;
format index_date date9.;
input patient index_date :mmddyy10.;
datalines;
1 3/4/2009
2 8/16/2010
3 6/6/2011
4 7/4/2009
;
data patients;
format treatment_date date9.;
input patient treatment_date :mmddyy10.;
datalines;
1 9/30/2008
1 10/11/2008
1 4/1/2009
2 1/1/2010
2 9/30/2010
3 12/30/2009
3 5/1/2010
3 8/1/2011
3 10/13/2011
3 12/13/2011
4 6/8/2009
4 3/1/2010
;
proc sql;
CREATE TABLE want AS
SELECT DISTINCT a.patient,
CASE WHEN min(a.treatment_date) >= intnx('month', b.index_date, -6) THEN min(a.treatment_date)
ELSE .
END AS min_treat format date9.,
CASE WHEN max(a.treatment_date) <= intnx('month', b.index_date, 6) THEN max(a.treatment_date)
ELSE .
END AS max_treat format date9.
FROM patients AS a
LEFT JOIN indexes AS b
ON a.patient = b.patient
GROUP BY a.patient;
quit;
Hi, thanks for your solution.
But I think that's not what I am looking for.
Sorry that I might not describe clearly. Actually, for each patient, I need three flag variables, one is whether or not having treatment date 6 month prior to the index date, the seconde one is whether or not having treatment date within pre 6 month and post 6 month of index date, the third flag is whether or not having treatment date 6 month after the index date. Do you understand what I mean now? Thank you very much!
I'm still not sure I understand, but what about the following to create the three flags you want? You can see the attached image where I checked the flags. If this isn't correct, it might help if you provide an example of the desired results.
proc sql;
CREATE TABLE want AS
SELECT DISTINCT a.patient,
/* Flag 1: having treatment date 6 month prior to the index date */
min(a.treatment_date) < intnx('month', b.index_date, -6) AS flag1 'Has Treatment Earlier Than Six Mos Before Index',
/* Flag 2: having treatment date within pre 6 month and post 6 month of index date */
max(a.treatment_date BETWEEN intnx('month', b.index_date, -6) AND intnx('month', b.index_date, 6)) AS flag2 'Has Treatment Between Index +/- Six Mos',
/* Flag 3: not having treatment date 6 month after the index date */
max(a.treatment_date) > intnx('month', b.index_date, 6) AS flag3 'Has Treatment Later Than Six Mos After Index'
FROM patients AS a
LEFT JOIN indexes AS b
ON a.patient = b.patient
GROUP BY a.patient;
quit;
proc print label;
run;
Hi Collinelliot,
Either one of the below data sets is what I desire to get. It can be listing the date or replacing the date by YES flag variables. The first table below is telling, for instance, the patientID=1 does not have any treatment history happenning 6 month before the index date, so the mindate_pre6month is missing or the flag is missing. The patientID=1 has treatment records happened between pre 6 month and post 6 month of index date, and among those records, the miniumen treatment date is 9/30/2008, so the second flag for patient=1 is YES.
And I tried your code, it works. It gives me the second table. GREAT!! Thank you. By the way, if I want the first table below, how can I modifiy your code? Thank you so much!
patient ID | mindate_pre6month | mindate_pre6month_post6month | mindate_post6month |
1 | . | 9/30/2008 | . |
2 | 1/1/2010 | 9/30/2010 | . |
3 | 12/30/2009 | 8/1/2011 | 12/13/2011 |
4 | . | 6/8/2009 | 3/1/2010 |
… | … | … | … |
patient ID | flag1_pre6month | flag2_pre6month_post6month | flag3_post6month |
1 | . | yes | . |
2 | yes | yes | . |
3 | yes | yes | yes |
4 | . | yes | yes |
… | … | … | … |
I think given the two desired outcomes, I'd change the approach to the following.
data indexes;
format index_date date9.;
input patient index_date :mmddyy10.;
index_low = intnx('month', index_date, -6);
index_high = intnx('month', index_date, 6);
datalines;
1 3/4/2009
2 8/16/2010
3 6/6/2011
4 7/4/2009
;
data patients;
format treatment_date date9.;
input patient treatment_date :mmddyy10.;
datalines;
1 9/30/2008
1 10/11/2008
1 4/1/2009
2 1/1/2010
2 9/30/2010
3 12/30/2009
3 5/1/2010
3 8/1/2011
3 10/13/2011
3 12/13/2011
4 6/8/2009
4 3/1/2010
;
proc sql;
CREATE TABLE allData AS
SELECT a.*,
(a.treatment_date < b.index_low) AS low_flag,
(a.treatment_date BETWEEN b.index_low AND b.index_high) AS mid_flag,
(a.treatment_date > b.index_high) AS high_flag,
CASE CALCULATED low_flag WHEN 1 THEN treatment_date ELSE . END AS low_date format date9.,
CASE CALCULATED mid_flag WHEN 1 THEN treatment_date ELSE . END AS mid_date format date9.,
CASE CALCULATED high_flag WHEN 1 THEN treatment_date ELSE . END AS high_date format date9.
FROM patients AS a
LEFT JOIN indexes AS b
ON a.patient = b.patient;
quit;
proc summary data = allData nway;
class patient;
var low_flag mid_flag high_flag;
output out = want_flags max = ;
run;
proc summary data = allData nway;
class patient;
var low_date mid_date high_date;
output out = want_dates min(low_date mid_date)= max(high_date) = ;
run;
This is Amazing! Thank you so much!
One more last question, if I want to use calendar month instead of absolute 30/31 days interval, how should I modify the intnx function?
To make the question more clear, here is the case I am struglling with. If a patient's index date is 9/15/2009, and the treatment date is 3/10/2009. Although the treatment happened a little bit more than 6 month before the index date, 3/10/2009-9/15/2009=-6.2(month). But I still want it to be within pre 6 month and post 6 month (the second flag, not the first flag). In other words, all treatments happended between 3/1/2009-3/31/2009 should be flagged as -6~+6 month. While if another treatment happened on 2/28/2009, it should be assigned to -7 month, although the actual interval is only -6.5...Do you understand what I mean? Thank you!
You need to define precisely what rules need to apply for the time periods and then fine tune how you use the intnx function to determine the low and high dates. If you'd rather make it be 180 days +/-, then change the first argument to 'day' and the third to +/- 180. There are probably a number of different date intervals and options ('B' or 'E', for example) to get what you want, but start with a clear and non-arbitrary rule for how you want to set the date ranges. I did read your examples, but I'm knocking this out with a few spare minutes I have and I wasn't able to easily interpret it into a clear rule. If you can do that, I or someone else here can surely help you get what you want.
This will count "months", you can assign other value based on the result.
proc sql; create table period as select a.*, b.Index_date, intck('month',a.treatment_date,b.index_date) as months from patients as a left join indexes as b on a.patient=b.patient; quit;
However this does beg the question of the meaning of "months". Do you mean calendar measurement, where 31 March is one month from 1 April or do you mean "30 (or possibly 31) day intervals"?
Actually, I mean calendar month. Thanks.
If I want to calculate calendar month, how to apply this function?
Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.