BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi SAS experts,

 

Could you please help to accomplish following tasks in SAS?

 

I have an ambulance data which has the admission date and insurance id. Inpatient data has the date of hospitalization and insurance id. I also have patient_id file which uniquely identifies patients. 

 

I’m trying to achieve following:

- find the records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis).

- find the records in the inpatient data where date of hospitalization (inp_date) equals or in the same month and year with (date_diagnosis).

- select the nearest of the ambulance_date and inp_date to the date_diagnosis, that is to say, if ambulance_date=Dec 15, 2010 and inp_date=Oct 10, 2010 while date_diagnosis Dec 12, 2010 then select ambulance_date=Dec 15, 2010.

- I have to consider only if ambulance and inpatient dates happen after date-diagnosis. But I couldn’t do it because when I delete ambulance_date before date_diagnosis some eligible records where inpatient_date after date_diagnosis were deleted.

 

data ambulance_date; 
input insurance_id amb_date;
cards;  
1 18168 
1 16997 
1 18024 
1 16955 
1 18280 
1 17542 
1 18365 
1 18299 
1 18884 
1 18954 
2 16939 
2 16941 
2 16966
;

data inpatient_date; 
input insurance_id inp_date; 
cards;
1 16692 
1 17519 
1 17209 
1 18024 
1 18299 
1 18309 
1 18891 
1 18823 
1 18609 
1 17051 
2 16960 
2 16979 
2 17005 
2 16974 
2 16979
; 

data patient_id; 
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884 
PAT092 2 17061 16940 
;

/*MERGE AMBULANCE DATA TO PATIENT DATA*/

proc sort data=ambulance_date; /*n=13*/
by INSURANCE_ID;
proc sort data=patient_id; /*n=2*/
by INSURANCE_ID;
data pat_amb; /*n=13*/ 
merge ambulance_date   (in=b)  
      patient_id       (in=c); 
by INSURANCE_ID;
if b; 
run; 

data pat_amb1; /*N=5*/ set pat_amb; /*N=13*/  
amb_year=year(amb_date);
amb_month=month(amb_date);
amb_yymm = put(amb_date,yymmn6.);
date_diagnosis_yymm = put(date_diagnosis,yymmn6.);
if date_diagnosis_yymm<=amb_yymm then output; 
run;


/*MERGE RESULTING DATA TO INPATIENT DATA*/

proc sort data=inpatient_date;/*n=15*/
by INSURANCE_ID;
proc sort data=pat_amb1;/*n=5*/
by INSURANCE_ID;
data pat_amb_inp;/*n=15*/
merge inpatient_date (in=b)   
      pat_amb1       (in=c); 
by INSURANCE_ID;
if b or c; 
run; 


data pat_amb1; /*N=5*/ set pat_amb_inp; /*N=15*/  
inp_year=year(inp_date);
inp_month=month(inp_date);
inp_yymm = put(inp_date,yymmn6.);

/* I CAN'T DO IT BECAUSE IT WILL DELETE ELIGIBLE
OBSERVATIONS WHERE AMBULANCE DATE IS AFTER DIAGNOSIS DATE*/
if date_diagnosis_yymm<=inp_yymm then output; 
run;








 

1 ACCEPTED SOLUTION

Accepted Solutions
FrankieNg
Fluorite | Level 6

HI Cruise,

 

I hope this help.

 

 

PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9.	/* select the nearest of the ambulance_date and inp_date to the date_diagnosis */
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id 
							AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(B.amb_date,DATE9.),3,9)	/* find records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
							AND A.date_diagnosis <= B.amb_date 													/* consider only if ambulance and inpatient dates happen after date-diagnosis */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id 
							AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(C.inp_date,DATE9.),3,9)	/* find records in the inpatient data where date of hospitalization to inpatient dept (inp_date) equals or in the same month and year with (date_diagnosis) */
							AND A.date_diagnosis <= C.inp_date													/* consider only if ambulance and inpatient dates happen after date-diagnosis */
;QUIT;

View solution in original post

10 REPLIES 10
Oligolas
Barite | Level 11

Hi,

 

SAS merge is not able to merge many to many, you need to use PROC SQL.

PROC SQL;
   /*CREATE TABLE want AS*/
      SELECT a.pat_id
           , a.insurance_id
           , a.date_death format is8601da.
           , a.date_diagnosis format is8601da.
           , b.amb_date format is8601da.
           , c.inp_date format is8601da.
           , ifc(amb_date<=inp_date,ifc(amb_date=inp_date,'amb_date = inp_date','amb_date < inp_date'),'amb_date > inp_date') as description format=$20.
           , min(amb_date,inp_date) as NearestDate format is8601da.
      FROM patient_id a
      LEFT JOIN ambulance_date b
      ON a.insurance_id eq b.insurance_id
      AND b.amb_date GE a.date_diagnosis /*amb_date after or on date_diagnosis*/
      AND substr(put(b.amb_date,is8601da.),1,7) EQ substr(put(a.date_diagnosis,is8601da.),1,7) /*amb_date in same month and year as date_diagnosis*/
      LEFT JOIN inpatient_date c
      ON a.insurance_id eq c.insurance_id
      AND c.inp_date GE a.date_diagnosis /*inp_date after or on date_diagnosis*/
      AND substr(put(c.inp_date,is8601da.),1,7) EQ substr(put(a.date_diagnosis,is8601da.),1,7) /*inp_date in same month and year as date_diagnosis*/
   ;
QUIT;
________________________

- Cheers -

FrankieNg
Fluorite | Level 6

HI Cruise,

 

I hope this help.

 

 

PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9.	/* select the nearest of the ambulance_date and inp_date to the date_diagnosis */
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id 
							AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(B.amb_date,DATE9.),3,9)	/* find records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
							AND A.date_diagnosis <= B.amb_date 													/* consider only if ambulance and inpatient dates happen after date-diagnosis */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id 
							AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(C.inp_date,DATE9.),3,9)	/* find records in the inpatient data where date of hospitalization to inpatient dept (inp_date) equals or in the same month and year with (date_diagnosis) */
							AND A.date_diagnosis <= C.inp_date													/* consider only if ambulance and inpatient dates happen after date-diagnosis */
;QUIT;
Cruise
Ammonite | Level 13
Hi Frankie, Thanks a lot.
Great approach, seems tiny problem here. See log below. It generates inpatient-date missing in the output data.

NOTE: This SAS session is using a registry in WORK. All changes will be lost at the end of this
session.
63 PROC SQL;
64 CREATE TABLE PAT_AMB_INP AS SELECT
65 A.pat_id,
66 A.insurance_id,
67 A.date_death AS date_death FORMAT=DATE9.,
68 A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
69 B.amb_date AS amb_date FORMAT=DATE9.,
70 C.inp_date AS inp_date FORMAT=DATE9.,
71 MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9. /* select the nearest of the
71 ! ambulance_date and inp_date to the date_diagnosis */
72 FROM patient_id AS A
73 LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id
74 AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) =
74 ! SUBSTR(PUT(B.amb_date,DATE9.),3,9) /* find records in the ambulance data where admission
74 ! date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
75 AND A.date_diagnosis <= B.amb_date
75 ! /* consider only if ambulance and inpatient dates happen after
75 ! date-diagnosis */
76 LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id
77 AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) =
77 ! SUBSTR(PUT(C.inp_date,DATE9.),3,9) /* find records in the inpatient data where date of
77 ! hospitalization to inpatient dept (inp_date) equals or in the same month and year with
77 ! (date_diagnosis) */
78 AND A.date_diagnosis <= C.inp_date
78 ! /* consider only if ambulance and inpatient dates happen after
78 ! date-diagnosis */
79 ;
NOTE: Invalid argument 3 to function SUBSTR. Missing values may be generated.
NOTE: Invalid argument 3 to function SUBSTR. Missing values may be generated.
NOTE: Invalid argument 3 to function SUBSTR. Missing values may be generated.
NOTE: Invalid argument 3 to function SUBSTR. Missing values may be generated.
NOTE: Table WORK.PAT_AMB_INP created, with 2 rows and 7 columns.

79 ! QUIT;
Cruise
Ammonite | Level 13
INP_DATE IS MISSING BECAUSE IT WAS SELECTED AS THE NEAREST DATE? OH NO, BUT THE AMBULANCE DATE SELECTED APPEARS UNDER AMB_DATE. ANY OF THESE AFFECTED THE FINAL RESULT BUT JUST CURIOUS AS TO WHY.
FrankieNg
Fluorite | Level 6

Hi Cruise,

 

You may refers to the solution prepared by Oligolas too. It provided the same output.

FrankieNg
Fluorite | Level 6

Hi Cruise,

 

The inpatient date missing is because none of the inpatient date fulfill the condition "find the records in the inpatient data where date of hospitalization (inp_date) equals or in the same month and year with (date_diagnosis)." for the second patient.

Cruise
Ammonite | Level 13

@FrankieNg @Oligolas

 

I ran the code on the actual two cases 

 

where; 

ambulance_date has 609 observations 
inpatient_date has 773 observations 
patient_id has 2 observations 

 

However, output dataset is following which includes all observations. However, I wanted the final selected variables and flagged from which dataset (ambulance or inpatinet) contributed to the nearest date. 

 

Shall I create the new discussion on this issue or you know why final dataset on the actual two cases are not restricted to the final selected date for the nearest date?

 

 

output data.png

 

 

 

 

 

FrankieNg
Fluorite | Level 6

By knowing which dataset (ambulance or inpatinet) contributed to the nearest date, you can include something like this in the PROC SQL after the NEAREST_DATE..

 

CASE WHEN CALCULATED NEAREST_DATE EQ amb_date THEN "Ambulance_Date" ELSE "Inpatient_Date" END AS DATE_SOURCE FORMAT=$20.

 

So you expect only 2 observations in your final dataset (because there are only 2 patients) ?

Cruise
Ammonite | Level 13

@FrankieNg

 

I really really appreciate your time indeed and getting back to me. 

 

I'd like to consider 15 days before and after the date_daignosis. Final dataset should include all dates, for example, from 09/1/2011 through 09/30/2011 because 9/15/2011 is the mid-point of this range by 15 days per-post in relation to the date_diagnosis while excluding any dates fell out of this 30-day range. Same logic applies to the id=2 patient as well. 

 

My original post said: same month and year however sep 1 and sep 30 would fall in the same month and year as same as sep 1 and sep 2. So, maybe, using 15 days before and after the mid-point which is date_diagnosis would make more sense. 

 

Dataset wanted for final output:

 

id amb_date inp-date date_diagnosis nereast source
1 9/15/2011 9/15/2011 9/15/2011 9/15/2011 both
1 9/15/2011 9/1/2011 9/15/2011 9/15/2011 ambulance
1 9/30/2011 9/10/2011 9/15/2011 9/10/2011 inpatient
1 9/1/2011 9/28/2011 9/15/2011 9/28/2011 inpatient
Cruise
Ammonite | Level 13
I'm wondering about using "between" in prc sql using pre-defined 15 day ranges around date_diagnosis

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1156 views
  • 3 likes
  • 3 in conversation