BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
EmilyAV
Calcite | Level 5

I have a large dataset with multiple (but a variable number of entries (identified by unique clin_date)) for each patient.

 

have:

ID clin_date surg_date clin_wt

001 01APR2017 03APR2017 85

001 02APR2017 03APR2017 84

001 03APR2017 03APR2017 85

002 10APR2017 01MAY2017 92

003 10APR2017 05MAY2017 54

003 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

004 13APR2017 05MAY .

005 10MAY2017 10MAY2017 .

 

I want to create a unique dataset that has only one observation per patient, in order of preference:

1) CLIN_DATE = SURG_DATE and CLIN_wt ne .

2) min(SURG_DATE - CLIN_DATE) and CLIN_wt ne .

3) CLIN_WT = .

 

...but no patients are deleted if (even patients with missing clin_wt have at least one entry)?

 

Want:

001 03APR2017 03APR2017 85

002 002 10APR2017 01MAY2017 92

03 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

005 14APR2017 10MAY2017 .

 

Thank you.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
HB
Barite | Level 11 HB
Barite | Level 11

Given

data visits;
   input ID clin_date:date10. surg_date:date9. clin_wt;
   format clin_date date10. surg_date date10.;
datalines;
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY2017 .
005 10MAY2017 10MAY2017 .
;
run;

 

Maybe something like:


proc sql;
	select	* 
	from visits
	where clin_wt is not null
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date)

	union 

	select * 
	from visits
	where clin_wt is null and ID not in 
		(
		select id 
		from visits
		where clin_wt is not null
		group by id
		having surg_date = max(surg_date) and clin_date = max(clin_date)
		)
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date);
quit;

which gives me 

ID clin_date surg_date clin_wt
1 3-Apr-17 3-Apr-17 85
2 10-Apr-17 1-May-17 92
3 12-Apr-17 5-May-17 52
4 12-Apr-17 5-May-17 62
5 10-May-17 10-May-17  

 

View solution in original post

4 REPLIES 4
HB
Barite | Level 11 HB
Barite | Level 11

Your want is a little messed up.  There is no 14Apr in the have.  And i don't really understand what you want to do with clin_wt. 

 

 

 

EmilyAV
Calcite | Level 5

My apologies.

I've repasted my question with the corrected want file here:

 

have:

ID clin_date surg_date clin_wt

001 01APR2017 03APR2017 85

001 02APR2017 03APR2017 84

001 03APR2017 03APR2017 85

002 10APR2017 01MAY2017 92

003 10APR2017 05MAY2017 54

003 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

004 13APR2017 05MAY .

005 10MAY2017 10MAY2017 .

 

I want to create a unique dataset that has only one observation per patient, in order of preference:

1) CLIN_DATE = SURG_DATE and CLIN_wt ne .

2) min(SURG_DATE - CLIN_DATE) and CLIN_wt ne .

3) CLIN_WT = .

 

...but no patients are deleted if (even patients with missing clin_wt have at least one entry)?

 

Want:

001 03APR2017 03APR2017 85

002 002 10APR2017 01MAY2017 92

03 12APR2017 05MAY2017 52

004 12APR2017 05MAY2017 62

005 10MAY2017 10MAY2017 .

 

CLIN_WT is the patient weight measured on the CLIN_DATE.

If there is any non-missing value of CLIN_WT available for a patient, I'd prefer to select that set of observations over a later missing CLIN_WT value.

However, if all CLIN_WT values are missing for a patient, I want to retain at least one observation for that patient: the one with the latest CLIN_DATE entry.

HB
Barite | Level 11 HB
Barite | Level 11

Given

data visits;
   input ID clin_date:date10. surg_date:date9. clin_wt;
   format clin_date date10. surg_date date10.;
datalines;
001 01APR2017 03APR2017 85
001 02APR2017 03APR2017 84
001 03APR2017 03APR2017 85
002 10APR2017 01MAY2017 92
003 10APR2017 05MAY2017 54
003 12APR2017 05MAY2017 52
004 12APR2017 05MAY2017 62
004 13APR2017 05MAY2017 .
005 10MAY2017 10MAY2017 .
;
run;

 

Maybe something like:


proc sql;
	select	* 
	from visits
	where clin_wt is not null
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date)

	union 

	select * 
	from visits
	where clin_wt is null and ID not in 
		(
		select id 
		from visits
		where clin_wt is not null
		group by id
		having surg_date = max(surg_date) and clin_date = max(clin_date)
		)
	group by id
	having surg_date = max(surg_date) and clin_date = max(clin_date);
quit;

which gives me 

ID clin_date surg_date clin_wt
1 3-Apr-17 3-Apr-17 85
2 10-Apr-17 1-May-17 92
3 12-Apr-17 5-May-17 52
4 12-Apr-17 5-May-17 62
5 10-May-17 10-May-17  

 

EmilyAV
Calcite | Level 5

Works great, thank you! I have never used the union function before.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 369 views
  • 0 likes
  • 2 in conversation