Hi all,
I have this output from the code below.
I want to only see the first diagnosis and first treatment date for each patient for each drug.
For example for patient_id 2038 i want to see diagnosis_date = 01/21/2010, treatment_date= 01/24/2010 for drug_code = A AND
diagnosis_date = 01/21/2010, treatment_date=01/24/2010 for drug_code = B
For patient_id 2120 i want to see diagnosis_date=01/09/2010 , treatment_date= 01/23/2010 for drug_code = B and
diagnosis_date = 01/09/2010 , treatment_date = 01/24/2010 for drug_code = A
proc sql;
create table _00_diag_trt as
select distinct a.*, b.*
from pt_diag as a
full join pt_trt as b
on a.patient_id = b.patient_id
order by a.patient_id , a.diagnosis_date ,b.treatment_date,b.drug_code;
quit;
proc print ;
run ;
Obs patient_id diagnosis_date diagnosis_code diagnosis treatment_date drug_code
1 2038 01/21/2010 174.9 Breast Cancer 01/24/2010 A
2 2038 01/21/2010 174.9 Breast Cancer 01/24/2010 B
3 2038 01/21/2010 174.9 Breast Cancer 01/27/2010 A
4 2038 01/21/2010 174.9 Breast Cancer 01/27/2010 B
5 2038 01/21/2010 174.9 Breast Cancer 01/30/2010 A
6 2038 01/21/2010 174.9 Breast Cancer 01/30/2010 B
7 2038 01/21/2010 174.9 Breast Cancer 02/02/2010 A
8 2038 01/21/2010 174.9 Breast Cancer 02/06/2010 A
9 2038 01/21/2010 174.9 Breast Cancer 02/11/2010 A
10 2038 01/21/2010 174.9 Breast Cancer 02/18/2010 A
11 2038 01/21/2010 174.9 Breast Cancer 02/20/2010 A
12 2038 01/21/2010 174.9 Breast Cancer 01/24/2017 A
13 2038 01/21/2010 174.9 Breast Cancer 01/27/2017 A
14 2038 01/21/2010 174.9 Breast Cancer 01/30/2017 A
15 2038 01/21/2010 174.9 Breast Cancer 02/02/2017 A
16 2038 01/21/2010 174.9 Breast Cancer 02/06/2017 A
17 2038 01/21/2010 174.9 Breast Cancer 02/11/2017 A
18 2038 01/21/2010 174.9 Breast Cancer 02/18/2017 A
19 2038 01/21/2010 174.9 Breast Cancer 02/20/2017 A
20 2120 01/09/2010 174.1 Breast Cancer 01/23/2010 B
21 2120 01/09/2010 174.1 Breast Cancer 01/24/2010 A
22 2120 01/09/2010 174.1 Breast Cancer 01/26/2010 A
23 2120 01/09/2010 174.1 Breast Cancer 01/26/2010 B
24 2120 01/09/2010 174.1 Breast Cancer 01/27/2010 A
25 2120 01/09/2010 174.1 Breast Cancer 01/27/2010 B
26 2120 01/09/2010 174.1 Breast Cancer 01/29/2010 A
27 2120 01/09/2010 174.1 Breast Cancer 01/29/2010 B
28 2120 01/09/2010 174.1 Breast Cancer 02/01/2010 A
29 2120 01/09/2010 174.1 Breast Cancer 02/01/2010 B
30 2120 01/09/2010 174.1 Breast Cancer 02/04/2010 A
31 2120 01/09/2010 174.1 Breast Cancer 02/04/2010 B
32 2120 01/09/2010 174.1 Breast Cancer 02/07/2010 A
33 2120 01/09/2010 174.1 Breast Cancer 02/07/2010 B
34 2120 01/09/2010 174.1 Breast Cancer 02/10/2010 A
35 2120 01/09/2010 174.1 Breast Cancer 02/10/2010 B
36 2120 01/09/2010 174.1 Breast Cancer 02/14/2010 A
37 2120 01/09/2010 174.1 Breast Cancer 02/14/2010 B
38 2120 01/09/2010 174.1 Breast Cancer 02/18/2010 A
39 2120 01/09/2010 174.1 Breast Cancer 02/18/2010 B
40 2120 01/09/2010 174.1 Breast Cancer 02/22/2010 A
41 2120 01/09/2010 174.1 Breast Cancer 02/22/2010 B
42 2120 01/09/2010 174.1 Breast Cancer 02/27/2010 A
43 2120 01/09/2010 174.1 Breast Cancer 02/27/2010 B
44 2120 01/09/2010 174.1 Breast Cancer 03/02/2010 A
45 2120 01/09/2010 174.1 Breast Cancer 03/02/2010 B
2 solutions:
1. Sort by drug code
order by a.patient_id, b.drug_code, a.diagnosis_date, b.treatment_date;
then use a data step keeping only first.drug_code
by PATIENT_ID DRUG_CODE;
if first.DRUG_CODE;
2. Add a group by clause in SQL to group by PATIENT_ID, DRUG_CODE and add a having clause to keep the record with the lowest (minimum) date in each group.
can you please write the code for sql option 2 ?
This is what i have but i am getting this error
147 proc sql;
148 create table _01_ttr as
149 select distinct min(treatment_date) as first_trt, patient_id, diagnosis_date, drug_code
150 from _01_diag_trt
151 group by patient_id,drug_code
152 having min(treatment_date);
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK._01_TTR created, with 90 rows and 4 columns.
I don't see any error.
You have a note informing you of the type of processing you requested when asking for the table data to be matched to its summarized data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.