Hi all,
I have this output from the code below.
I want to only see the first diagnosis and first treatment date for each patient for each drug.
For example for patient_id 2038 i want to see diagnosis_date = 01/21/2010, treatment_date= 01/24/2010 for drug_code = A AND
diagnosis_date = 01/21/2010, treatment_date=01/24/2010 for drug_code = B
For patient_id 2120 i want to see diagnosis_date=01/09/2010 , treatment_date= 01/23/2010 for drug_code = B and
diagnosis_date = 01/09/2010 , treatment_date = 01/24/2010 for drug_code = A
proc sql;
create table _00_diag_trt as
select distinct a.*, b.*
from pt_diag as a
full join pt_trt as b
on a.patient_id = b.patient_id
order by a.patient_id , a.diagnosis_date ,b.treatment_date,b.drug_code;
quit;
proc print ;
run ;
Obs patient_id diagnosis_date diagnosis_code diagnosis treatment_date drug_code
1 2038 01/21/2010 174.9 Breast Cancer 01/24/2010 A
2 2038 01/21/2010 174.9 Breast Cancer 01/24/2010 B
3 2038 01/21/2010 174.9 Breast Cancer 01/27/2010 A
4 2038 01/21/2010 174.9 Breast Cancer 01/27/2010 B
5 2038 01/21/2010 174.9 Breast Cancer 01/30/2010 A
6 2038 01/21/2010 174.9 Breast Cancer 01/30/2010 B
7 2038 01/21/2010 174.9 Breast Cancer 02/02/2010 A
8 2038 01/21/2010 174.9 Breast Cancer 02/06/2010 A
9 2038 01/21/2010 174.9 Breast Cancer 02/11/2010 A
10 2038 01/21/2010 174.9 Breast Cancer 02/18/2010 A
11 2038 01/21/2010 174.9 Breast Cancer 02/20/2010 A
12 2038 01/21/2010 174.9 Breast Cancer 01/24/2017 A
13 2038 01/21/2010 174.9 Breast Cancer 01/27/2017 A
14 2038 01/21/2010 174.9 Breast Cancer 01/30/2017 A
15 2038 01/21/2010 174.9 Breast Cancer 02/02/2017 A
16 2038 01/21/2010 174.9 Breast Cancer 02/06/2017 A
17 2038 01/21/2010 174.9 Breast Cancer 02/11/2017 A
18 2038 01/21/2010 174.9 Breast Cancer 02/18/2017 A
19 2038 01/21/2010 174.9 Breast Cancer 02/20/2017 A
20 2120 01/09/2010 174.1 Breast Cancer 01/23/2010 B
21 2120 01/09/2010 174.1 Breast Cancer 01/24/2010 A
22 2120 01/09/2010 174.1 Breast Cancer 01/26/2010 A
23 2120 01/09/2010 174.1 Breast Cancer 01/26/2010 B
24 2120 01/09/2010 174.1 Breast Cancer 01/27/2010 A
25 2120 01/09/2010 174.1 Breast Cancer 01/27/2010 B
26 2120 01/09/2010 174.1 Breast Cancer 01/29/2010 A
27 2120 01/09/2010 174.1 Breast Cancer 01/29/2010 B
28 2120 01/09/2010 174.1 Breast Cancer 02/01/2010 A
29 2120 01/09/2010 174.1 Breast Cancer 02/01/2010 B
30 2120 01/09/2010 174.1 Breast Cancer 02/04/2010 A
31 2120 01/09/2010 174.1 Breast Cancer 02/04/2010 B
32 2120 01/09/2010 174.1 Breast Cancer 02/07/2010 A
33 2120 01/09/2010 174.1 Breast Cancer 02/07/2010 B
34 2120 01/09/2010 174.1 Breast Cancer 02/10/2010 A
35 2120 01/09/2010 174.1 Breast Cancer 02/10/2010 B
36 2120 01/09/2010 174.1 Breast Cancer 02/14/2010 A
37 2120 01/09/2010 174.1 Breast Cancer 02/14/2010 B
38 2120 01/09/2010 174.1 Breast Cancer 02/18/2010 A
39 2120 01/09/2010 174.1 Breast Cancer 02/18/2010 B
40 2120 01/09/2010 174.1 Breast Cancer 02/22/2010 A
41 2120 01/09/2010 174.1 Breast Cancer 02/22/2010 B
42 2120 01/09/2010 174.1 Breast Cancer 02/27/2010 A
43 2120 01/09/2010 174.1 Breast Cancer 02/27/2010 B
44 2120 01/09/2010 174.1 Breast Cancer 03/02/2010 A
45 2120 01/09/2010 174.1 Breast Cancer 03/02/2010 B
2 solutions:
1. Sort by drug code
order by a.patient_id, b.drug_code, a.diagnosis_date, b.treatment_date;
then use a data step keeping only first.drug_code
by PATIENT_ID DRUG_CODE;
if first.DRUG_CODE;
2. Add a group by clause in SQL to group by PATIENT_ID, DRUG_CODE and add a having clause to keep the record with the lowest (minimum) date in each group.
can you please write the code for sql option 2 ?
This is what i have but i am getting this error
147 proc sql;
148 create table _01_ttr as
149 select distinct min(treatment_date) as first_trt, patient_id, diagnosis_date, drug_code
150 from _01_diag_trt
151 group by patient_id,drug_code
152 having min(treatment_date);
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK._01_TTR created, with 90 rows and 4 columns.
I don't see any error.
You have a note informing you of the type of processing you requested when asking for the table data to be matched to its summarized data.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.