Hi SAS experts,
Could you please help to accomplish following tasks in SAS?
I have an ambulance data which has the admission date and insurance id. Inpatient data has the date of hospitalization and insurance id. I also have patient_id file which uniquely identifies patients.
I’m trying to achieve following:
- find the records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis).
- find the records in the inpatient data where date of hospitalization (inp_date) equals or in the same month and year with (date_diagnosis).
- select the nearest of the ambulance_date and inp_date to the date_diagnosis, that is to say, if ambulance_date=Dec 15, 2010 and inp_date=Oct 10, 2010 while date_diagnosis Dec 12, 2010 then select ambulance_date=Dec 15, 2010.
- I have to consider only if ambulance and inpatient dates happen after date-diagnosis. But I couldn’t do it because when I delete ambulance_date before date_diagnosis some eligible records where inpatient_date after date_diagnosis were deleted.
data ambulance_date;
input insurance_id amb_date;
cards;
1 18168
1 16997
1 18024
1 16955
1 18280
1 17542
1 18365
1 18299
1 18884
1 18954
2 16939
2 16941
2 16966
;
data inpatient_date;
input insurance_id inp_date;
cards;
1 16692
1 17519
1 17209
1 18024
1 18299
1 18309
1 18891
1 18823
1 18609
1 17051
2 16960
2 16979
2 17005
2 16974
2 16979
;
data patient_id;
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884
PAT092 2 17061 16940
;
/*MERGE AMBULANCE DATA TO PATIENT DATA*/
proc sort data=ambulance_date; /*n=13*/
by INSURANCE_ID;
proc sort data=patient_id; /*n=2*/
by INSURANCE_ID;
data pat_amb; /*n=13*/
merge ambulance_date (in=b)
patient_id (in=c);
by INSURANCE_ID;
if b;
run;
data pat_amb1; /*N=5*/ set pat_amb; /*N=13*/
amb_year=year(amb_date);
amb_month=month(amb_date);
amb_yymm = put(amb_date,yymmn6.);
date_diagnosis_yymm = put(date_diagnosis,yymmn6.);
if date_diagnosis_yymm<=amb_yymm then output;
run;
/*MERGE RESULTING DATA TO INPATIENT DATA*/
proc sort data=inpatient_date;/*n=15*/
by INSURANCE_ID;
proc sort data=pat_amb1;/*n=5*/
by INSURANCE_ID;
data pat_amb_inp;/*n=15*/
merge inpatient_date (in=b)
pat_amb1 (in=c);
by INSURANCE_ID;
if b or c;
run;
data pat_amb1; /*N=5*/ set pat_amb_inp; /*N=15*/
inp_year=year(inp_date);
inp_month=month(inp_date);
inp_yymm = put(inp_date,yymmn6.);
/* I CAN'T DO IT BECAUSE IT WILL DELETE ELIGIBLE
OBSERVATIONS WHERE AMBULANCE DATE IS AFTER DIAGNOSIS DATE*/
if date_diagnosis_yymm<=inp_yymm then output;
run;
HI Cruise,
I hope this help.
PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9. /* select the nearest of the ambulance_date and inp_date to the date_diagnosis */
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id
AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(B.amb_date,DATE9.),3,9) /* find records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
AND A.date_diagnosis <= B.amb_date /* consider only if ambulance and inpatient dates happen after date-diagnosis */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id
AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(C.inp_date,DATE9.),3,9) /* find records in the inpatient data where date of hospitalization to inpatient dept (inp_date) equals or in the same month and year with (date_diagnosis) */
AND A.date_diagnosis <= C.inp_date /* consider only if ambulance and inpatient dates happen after date-diagnosis */
;QUIT;
Hi,
SAS merge is not able to merge many to many, you need to use PROC SQL.
PROC SQL;
/*CREATE TABLE want AS*/
SELECT a.pat_id
, a.insurance_id
, a.date_death format is8601da.
, a.date_diagnosis format is8601da.
, b.amb_date format is8601da.
, c.inp_date format is8601da.
, ifc(amb_date<=inp_date,ifc(amb_date=inp_date,'amb_date = inp_date','amb_date < inp_date'),'amb_date > inp_date') as description format=$20.
, min(amb_date,inp_date) as NearestDate format is8601da.
FROM patient_id a
LEFT JOIN ambulance_date b
ON a.insurance_id eq b.insurance_id
AND b.amb_date GE a.date_diagnosis /*amb_date after or on date_diagnosis*/
AND substr(put(b.amb_date,is8601da.),1,7) EQ substr(put(a.date_diagnosis,is8601da.),1,7) /*amb_date in same month and year as date_diagnosis*/
LEFT JOIN inpatient_date c
ON a.insurance_id eq c.insurance_id
AND c.inp_date GE a.date_diagnosis /*inp_date after or on date_diagnosis*/
AND substr(put(c.inp_date,is8601da.),1,7) EQ substr(put(a.date_diagnosis,is8601da.),1,7) /*inp_date in same month and year as date_diagnosis*/
;
QUIT;
- Cheers -
HI Cruise,
I hope this help.
PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9. /* select the nearest of the ambulance_date and inp_date to the date_diagnosis */
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id
AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(B.amb_date,DATE9.),3,9) /* find records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
AND A.date_diagnosis <= B.amb_date /* consider only if ambulance and inpatient dates happen after date-diagnosis */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id
AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(C.inp_date,DATE9.),3,9) /* find records in the inpatient data where date of hospitalization to inpatient dept (inp_date) equals or in the same month and year with (date_diagnosis) */
AND A.date_diagnosis <= C.inp_date /* consider only if ambulance and inpatient dates happen after date-diagnosis */
;QUIT;
Hi Cruise,
You may refers to the solution prepared by Oligolas too. It provided the same output.
Hi Cruise,
The inpatient date missing is because none of the inpatient date fulfill the condition "find the records in the inpatient data where date of hospitalization (inp_date) equals or in the same month and year with (date_diagnosis)." for the second patient.
I ran the code on the actual two cases
where;
ambulance_date has 609 observations
inpatient_date has 773 observations
patient_id has 2 observations
However, output dataset is following which includes all observations. However, I wanted the final selected variables and flagged from which dataset (ambulance or inpatinet) contributed to the nearest date.
Shall I create the new discussion on this issue or you know why final dataset on the actual two cases are not restricted to the final selected date for the nearest date?
By knowing which dataset (ambulance or inpatinet) contributed to the nearest date, you can include something like this in the PROC SQL after the NEAREST_DATE..
CASE WHEN CALCULATED NEAREST_DATE EQ amb_date THEN "Ambulance_Date" ELSE "Inpatient_Date" END AS DATE_SOURCE FORMAT=$20.
So you expect only 2 observations in your final dataset (because there are only 2 patients) ?
I really really appreciate your time indeed and getting back to me.
I'd like to consider 15 days before and after the date_daignosis. Final dataset should include all dates, for example, from 09/1/2011 through 09/30/2011 because 9/15/2011 is the mid-point of this range by 15 days per-post in relation to the date_diagnosis while excluding any dates fell out of this 30-day range. Same logic applies to the id=2 patient as well.
My original post said: same month and year however sep 1 and sep 30 would fall in the same month and year as same as sep 1 and sep 2. So, maybe, using 15 days before and after the mid-point which is date_diagnosis would make more sense.
Dataset wanted for final output:
id | amb_date | inp-date | date_diagnosis | nereast | source |
1 | 9/15/2011 | 9/15/2011 | 9/15/2011 | 9/15/2011 | both |
1 | 9/15/2011 | 9/1/2011 | 9/15/2011 | 9/15/2011 | ambulance |
1 | 9/30/2011 | 9/10/2011 | 9/15/2011 | 9/10/2011 | inpatient |
1 | 9/1/2011 | 9/28/2011 | 9/15/2011 | 9/28/2011 | inpatient |
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.