I have a large dataset with multiple (but a variable number of entries (identified by unique clin_date)) for each patient.
have:
ID clin_date surg_date clin_wt
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY .
005 10MAY2017 10MAY2017 .
I want to create a unique dataset that has only one observation per patient, in order of preference:
1) CLIN_DATE = SURG_DATE and CLIN_wt ne .
2) min(SURG_DATE - CLIN_DATE) and CLIN_wt ne .
3) CLIN_WT = .
...but no patients are deleted if (even patients with missing clin_wt have at least one entry)?
Want:
001 03APR2017 03APR2017 85
002 002 10APR2017 01MAY2017 92
03 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
005 14APR2017 10MAY2017 .
Thank you.
Given
data visits;
input ID clin_date:date10. surg_date:date9. clin_wt;
format clin_date date10. surg_date date10.;
datalines;
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY2017 .
005 10MAY2017 10MAY2017 .
;
run;
Maybe something like:
proc sql;
select *
from visits
where clin_wt is not null
group by id
having surg_date = max(surg_date) and clin_date = max(clin_date)
union
select *
from visits
where clin_wt is null and ID not in
(
select id
from visits
where clin_wt is not null
group by id
having surg_date = max(surg_date) and clin_date = max(clin_date)
)
group by id
having surg_date = max(surg_date) and clin_date = max(clin_date);
quit;
which gives me
ID | clin_date | surg_date | clin_wt |
1 | 3-Apr-17 | 3-Apr-17 | 85 |
2 | 10-Apr-17 | 1-May-17 | 92 |
3 | 12-Apr-17 | 5-May-17 | 52 |
4 | 12-Apr-17 | 5-May-17 | 62 |
5 | 10-May-17 | 10-May-17 |
Your want is a little messed up. There is no 14Apr in the have. And i don't really understand what you want to do with clin_wt.
My apologies.
I've repasted my question with the corrected want file here:
have:
ID clin_date surg_date clin_wt
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY .
005 10MAY2017 10MAY2017 .
I want to create a unique dataset that has only one observation per patient, in order of preference:
1) CLIN_DATE = SURG_DATE and CLIN_wt ne .
2) min(SURG_DATE - CLIN_DATE) and CLIN_wt ne .
3) CLIN_WT = .
...but no patients are deleted if (even patients with missing clin_wt have at least one entry)?
Want:
001 03APR2017 03APR2017 85
002 002 10APR2017 01MAY2017 92
03 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
005 10MAY2017 10MAY2017 .
CLIN_WT is the patient weight measured on the CLIN_DATE.
If there is any non-missing value of CLIN_WT available for a patient, I'd prefer to select that set of observations over a later missing CLIN_WT value.
However, if all CLIN_WT values are missing for a patient, I want to retain at least one observation for that patient: the one with the latest CLIN_DATE entry.
Given
data visits;
input ID clin_date:date10. surg_date:date9. clin_wt;
format clin_date date10. surg_date date10.;
datalines;
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY2017 .
005 10MAY2017 10MAY2017 .
;
run;
Maybe something like:
proc sql;
select *
from visits
where clin_wt is not null
group by id
having surg_date = max(surg_date) and clin_date = max(clin_date)
union
select *
from visits
where clin_wt is null and ID not in
(
select id
from visits
where clin_wt is not null
group by id
having surg_date = max(surg_date) and clin_date = max(clin_date)
)
group by id
having surg_date = max(surg_date) and clin_date = max(clin_date);
quit;
which gives me
ID | clin_date | surg_date | clin_wt |
1 | 3-Apr-17 | 3-Apr-17 | 85 |
2 | 10-Apr-17 | 1-May-17 | 92 |
3 | 12-Apr-17 | 5-May-17 | 52 |
4 | 12-Apr-17 | 5-May-17 | 62 |
5 | 10-May-17 | 10-May-17 |
Works great, thank you! I have never used the union function before.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.