Solved: Re: Select single observation based on two unique dates

EmilyAV · Posted 09-11-2023 05:08 PM

I have a large dataset with multiple (but a variable number of entries (identified by unique clin_date)) for each patient.

have:

ID clin_date surg_date clin_wt

001 01APR2017 03APR2017 85

001 02APR2017 03APR2017 84

001 03APR2017 03APR2017 85

002 10APR2017 01MAY2017 92

003 10APR2017 05MAY2017 54

003 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

004 13APR2017 05MAY .

005 10MAY2017 10MAY2017 .

I want to create a unique dataset that has only one observation per patient, in order of preference:

1) CLIN_DATE = SURG_DATE and CLIN_wt ne .

2) min(SURG_DATE - CLIN_DATE) and CLIN_wt ne .

3) CLIN_WT = .

...but no patients are deleted if (even patients with missing clin_wt have at least one entry)?

Want:

001 03APR2017 03APR2017 85

002 002 10APR2017 01MAY2017 92

03 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

005 14APR2017 10MAY2017 .

Thank you.

HB · Posted 09-11-2023 06:52 PM

Given

data visits;
   input ID clin_date:date10. surg_date:date9. clin_wt;
   format clin_date date10. surg_date date10.;
datalines;
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY2017 .
005 10MAY2017 10MAY2017 .
;
run;

Maybe something like:


proc sql;
	select	* 
	from visits
	where clin_wt is not null
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date)

	union 

	select * 
	from visits
	where clin_wt is null and ID not in 
		(
		select id 
		from visits
		where clin_wt is not null
		group by id
		having surg_date = max(surg_date) and clin_date = max(clin_date)
		)
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date);
quit;

which gives me

ID	clin_date	surg_date	clin_wt
1	3-Apr-17	3-Apr-17	85
2	10-Apr-17	1-May-17	92
3	12-Apr-17	5-May-17	52
4	12-Apr-17	5-May-17	62
5	10-May-17	10-May-17

View solution in original post

HB · Posted 09-11-2023 06:19 PM

Your want is a little messed up. There is no 14Apr in the have. And i don't really understand what you want to do with clin_wt.

EmilyAV · Posted 09-11-2023 06:24 PM

My apologies.

I've repasted my question with the corrected want file here:

have:

ID clin_date surg_date clin_wt

001 01APR2017 03APR2017 85

001 02APR2017 03APR2017 84

001 03APR2017 03APR2017 85

002 10APR2017 01MAY2017 92

003 10APR2017 05MAY2017 54

003 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

004 13APR2017 05MAY .

005 10MAY2017 10MAY2017 .

I want to create a unique dataset that has only one observation per patient, in order of preference:

1) CLIN_DATE = SURG_DATE and CLIN_wt ne .

2) min(SURG_DATE - CLIN_DATE) and CLIN_wt ne .

3) CLIN_WT = .

...but no patients are deleted if (even patients with missing clin_wt have at least one entry)?

Want:

001 03APR2017 03APR2017 85

002 002 10APR2017 01MAY2017 92

03 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

005 10MAY2017 10MAY2017 .

CLIN_WT is the patient weight measured on the CLIN_DATE.

If there is any non-missing value of CLIN_WT available for a patient, I'd prefer to select that set of observations over a later missing CLIN_WT value.

However, if all CLIN_WT values are missing for a patient, I want to retain at least one observation for that patient: the one with the latest CLIN_DATE entry.

HB · Posted 09-11-2023 06:52 PM

Given

data visits;
   input ID clin_date:date10. surg_date:date9. clin_wt;
   format clin_date date10. surg_date date10.;
datalines;
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY2017 .
005 10MAY2017 10MAY2017 .
;
run;

Maybe something like:


proc sql;
	select	* 
	from visits
	where clin_wt is not null
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date)

	union 

	select * 
	from visits
	where clin_wt is null and ID not in 
		(
		select id 
		from visits
		where clin_wt is not null
		group by id
		having surg_date = max(surg_date) and clin_date = max(clin_date)
		)
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date);
quit;

which gives me

ID	clin_date	surg_date	clin_wt
1	3-Apr-17	3-Apr-17	85
2	10-Apr-17	1-May-17	92
3	12-Apr-17	5-May-17	52
4	12-Apr-17	5-May-17	62
5	10-May-17	10-May-17

EmilyAV · Posted 09-11-2023 07:08 PM

Works great, thank you! I have never used the union function before.