BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi SAS experts, 

 

I have an ambulance data (ambualnce_date) which has the admission date (amb_date) and insurance_id. Inpatient data has the date of hospitalization (inp_date) and insurance_id. I also have patient_id file which uniquely identifies patients. My actual data sets are large where patient_id data has about 100k while inpatient and ambulance data has millions of observations for each claims. However, I'd like to solve the problem for only two cases and move forward. 

 

My objective: Select the date range of 15 days before and after the date_daignosis.
Final dataset should include, for example, from 09/1/2011 through 09/30/2011 because
9/15/2011 is the mid-point. While excluding any dates fell out of this 30-day range.

 

My original post said: same month and year however sep 1 and sep 30 would fall in the same
month and year as same as sep 1 and sep 2. So, maybe, using 15 days before and after the
mid-point which is date_diagnosis would make more sense.

 

Dataset wanted for final output:

id amb_date inp-date date_diagnosis nereast source
1 9/15/2011 9/15/2011 9/15/2011 9/15/2011 both
1 9/15/2011 9/1/2011 9/15/2011 9/15/2011 ambulance
1 9/30/2011 9/10/2011 9/15/2011 9/10/2011 inpatient
1 9/1/2011 9/28/2011 9/15/2011 9/28/2011 inpatient

 

This is the continuation of the post here:

https://communities.sas.com/t5/SAS-Programming/Logical-decisions-based-on-the-before-and-after-certa...

 

but with slight change in the condition set for the selection of the date range. 


The problem so far is solved with the condition meeting with same month and year in the link specified above thanks to FrankieNg and  And I'll greatly appreciate if you help me figure the problem with the new condition for the date range. I was considering of using BETWEEN function in PROC SQL with the predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15). 

 

Any hints, pointers, ideas or direct help in the code will be appreciated.Thanks in advance. 

 

data ambulance_date; 
input insurance_id amb_date;
cards;  
1 16576 
1 17616 
1 17542 
1 17814 
1 17575 
1 17541 
1 17822 
1 17703 
1 18184 
1 17532 
1 17972 
1 17935 
2 16989 
1 17265 
2 16997 
2 17008 
1 18231 
2 16997 
2 17000 
1 17056 
1 16831 
2 16997 
2 17008 
1 18162 
1 18024 
1 18276 
1 17773 
1 18299 
1 18387 
1 18609 
1 18486 
1 18486 
1 18486 
1 18486 
1 18571 
1 18488 
1 18585 
1 18601 
1 17527 
1 18722 
1 18907 
1 18869 
1 18808 
1 18760 
1 18760 
1 18760 
1 18821 
1 18903 
1 18954 
1 18136 
2 16974 
2 16974 
2 16974 
2 17010 
1 17036 
1 18455 
1 18956 
;

data inpatient_date; 
input insurance_id inp_date; 
cards;
1 16623 
1 16692 
1 16726 
1 17712 
1 16709 
1 17541 
1 17827 
1 18110 
1 17790 
2 16974 
2 16939 
1 16952 
2 16966 
1 17036 
2 16981 
2 16974 
2 16974 
1 17616 
1 17546 
1 17240 
1 17245 
1 17689 
2 16940 
1 17895 
2 17000 
1 17736 
1 17543 
1 17672 
1 18024 
1 18175 
1 17819 
1 18184 
1 18122 
1 18231 
1 17500 
1 18299 
2 16997 
2 16997 
2 16997 
2 17008 
2 16989 
1 18695 
1 18722 
1 18722 
1 18309 
1 18309 
1 18869 
1 18836 
1 18110 
1 18110 
1 18956 
1 18321 
1 18760 
1 18299 
1 18299 
1 18299 
1 18455 
1 18455 
1 18534 
1 18857 
1 18857 
1 18760 
1 18884 
1 17988 
1 17972 
1 18895
; 

data patient_id; 
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884 
PAT092 2 17061 16940 
;

PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9.	/* select the nearest of the ambulance_date and inp_date to the date_diagnosis */
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id 
		  AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(B.amb_date,DATE9.),3,9)	/* find records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
		  AND A.date_diagnosis <= B.amb_date 													/* consider only if ambulance and inpatient dates happen after date-diagnosis */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id 
		  AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(C.inp_date,DATE9.),3,9)	/* find records in the inpatient data where date of hospitalization to inpatient dept (inp_date) equals or in the same month and year with (date_diagnosis) */
		  AND A.date_diagnosis <= C.inp_date													/* consider only if ambulance and inpatient dates happen after date-diagnosis */
;
QUIT;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FrankieNg
Fluorite | Level 6

Hi Cruise,

 

When come to date range, we usually use INTNX function.

I've modified a bit of the data source, to show more observation from the output..

Hope this help.

 

data ambulance_date; 
input insurance_id amb_date;
cards;  
1 16576 
1 17616 
1 17542 
1 17814 
1 17575 
1 17541 
1 17822 
1 17703 
1 18184 
1 17532 
1 17972 
1 17935 
2 16989 
1 17265 
2 16997 
2 17008 
1 18231 
2 16997 
2 17000 
1 17056 
1 16831 
2 16940 
2 17008 
1 18162 
1 18024 
1 18276 
1 17773 
1 18299 
1 18387 
1 18609 
1 18486 
1 18486 
1 18486 
1 18486 
1 18571 
1 18488 
1 18585 
1 18601 
1 17527 
1 18722 
1 18907 
1 18869 
1 18808 
1 18760 
1 18760 
1 18760 
1 18821 
1 18903 
1 18954 
1 18136 
2 16925 
2 16974 
2 16955 
2 17010 
1 17036 
1 18455 
1 18956 
;

data inpatient_date; 
input insurance_id inp_date; 
cards;
1 16623 
1 16692 
1 16726 
1 17712 
1 16709 
1 17541 
1 17827 
1 18110 
1 17790 
2 16974 
2 16939 
1 16952 
2 16950 
1 17036 
2 16981 
2 16974 
2 16930 
1 17616 
1 17546 
1 17240 
1 17245 
1 17689 
2 16940 
1 17895 
2 17000 
1 17736 
1 17543 
1 17672 
1 18024 
1 18175 
1 17819 
1 18184 
1 18122 
1 18231 
1 17500 
1 18299 
2 16997 
2 16997 
2 16997 
2 17008 
2 16939 
1 18695 
1 18722 
1 18722 
1 18309 
1 18309 
1 18869 
1 18836 
1 18110 
1 18110 
1 18956 
1 18321 
1 18760 
1 18299 
1 18299 
1 18299 
1 18455 
1 18455 
1 18534 
1 18857 
1 18857 
1 18760 
1 18884 
1 17988 
1 17972 
1 18895
; 

data patient_id; 
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884 
PAT092 2 17061 16940 
;




%LET RANGE = 15;

PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
ABS(INTCK('DAY', date_diagnosis, amb_date)) AS AMB_RANGE,
ABS(INTCK('DAY', date_diagnosis, inp_date)) AS INP_RANGE,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN amb_date
	 WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN inp_date
	 WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN amb_date
	 END AS NEAREST_DATE FORMAT DATE9.,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN 'BOTH'
	 WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN 'INPATIENT'
	 WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN 'AMBULANCE'
	 END AS SOURCE FORMAT $15.
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id 
		  AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= B.amb_date <= INTNX('DAY', A.date_diagnosis, &RANGE)	/* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id 
		  AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= C.inp_date <= INTNX('DAY', A.date_diagnosis, &RANGE)	/* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
;
QUIT;

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

Your data appear to be sorted by id, but not date within id?  Is that true.  If your data are sorted by id then a data step solution may be a good deal faster than SQL. 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Cruise
Ammonite | Level 13

@mkeintz

 

Individual patients have multiple insurance id. However, once datasets are merged on the insurance_id (which is only linkage variable common across these three different datasets) then the final linked data should be sorted by pat_id. So, you're right, not sorted by date within pat_id. 

 

do you have time to show me how in data step? maybe I should extend my sample datasets in this forum.

mkeintz
PROC Star

I will look at it later this evening if there is time then.  Busy now.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Cruise
Ammonite | Level 13
Thanks a lot. I'm updating my post with the larger sample datasets right now.
FrankieNg
Fluorite | Level 6

Hi Cruise,

 

When come to date range, we usually use INTNX function.

I've modified a bit of the data source, to show more observation from the output..

Hope this help.

 

data ambulance_date; 
input insurance_id amb_date;
cards;  
1 16576 
1 17616 
1 17542 
1 17814 
1 17575 
1 17541 
1 17822 
1 17703 
1 18184 
1 17532 
1 17972 
1 17935 
2 16989 
1 17265 
2 16997 
2 17008 
1 18231 
2 16997 
2 17000 
1 17056 
1 16831 
2 16940 
2 17008 
1 18162 
1 18024 
1 18276 
1 17773 
1 18299 
1 18387 
1 18609 
1 18486 
1 18486 
1 18486 
1 18486 
1 18571 
1 18488 
1 18585 
1 18601 
1 17527 
1 18722 
1 18907 
1 18869 
1 18808 
1 18760 
1 18760 
1 18760 
1 18821 
1 18903 
1 18954 
1 18136 
2 16925 
2 16974 
2 16955 
2 17010 
1 17036 
1 18455 
1 18956 
;

data inpatient_date; 
input insurance_id inp_date; 
cards;
1 16623 
1 16692 
1 16726 
1 17712 
1 16709 
1 17541 
1 17827 
1 18110 
1 17790 
2 16974 
2 16939 
1 16952 
2 16950 
1 17036 
2 16981 
2 16974 
2 16930 
1 17616 
1 17546 
1 17240 
1 17245 
1 17689 
2 16940 
1 17895 
2 17000 
1 17736 
1 17543 
1 17672 
1 18024 
1 18175 
1 17819 
1 18184 
1 18122 
1 18231 
1 17500 
1 18299 
2 16997 
2 16997 
2 16997 
2 17008 
2 16939 
1 18695 
1 18722 
1 18722 
1 18309 
1 18309 
1 18869 
1 18836 
1 18110 
1 18110 
1 18956 
1 18321 
1 18760 
1 18299 
1 18299 
1 18299 
1 18455 
1 18455 
1 18534 
1 18857 
1 18857 
1 18760 
1 18884 
1 17988 
1 17972 
1 18895
; 

data patient_id; 
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884 
PAT092 2 17061 16940 
;




%LET RANGE = 15;

PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
ABS(INTCK('DAY', date_diagnosis, amb_date)) AS AMB_RANGE,
ABS(INTCK('DAY', date_diagnosis, inp_date)) AS INP_RANGE,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN amb_date
	 WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN inp_date
	 WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN amb_date
	 END AS NEAREST_DATE FORMAT DATE9.,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN 'BOTH'
	 WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN 'INPATIENT'
	 WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN 'AMBULANCE'
	 END AS SOURCE FORMAT $15.
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id 
		  AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= B.amb_date <= INTNX('DAY', A.date_diagnosis, &RANGE)	/* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id 
		  AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= C.inp_date <= INTNX('DAY', A.date_diagnosis, &RANGE)	/* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
;
QUIT;
Cruise
Ammonite | Level 13
Thank you very much Frankie. Worked out well!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1038 views
  • 3 likes
  • 3 in conversation