Hi SAS experts,
I have an ambulance data (ambualnce_date) which has the admission date (amb_date) and insurance_id. Inpatient data has the date of hospitalization (inp_date) and insurance_id. I also have patient_id file which uniquely identifies patients. My actual data sets are large where patient_id data has about 100k while inpatient and ambulance data has millions of observations for each claims. However, I'd like to solve the problem for only two cases and move forward.
My objective: Select the date range of 15 days before and after the date_daignosis.
Final dataset should include, for example, from 09/1/2011 through 09/30/2011 because
9/15/2011 is the mid-point. While excluding any dates fell out of this 30-day range.
My original post said: same month and year however sep 1 and sep 30 would fall in the same
month and year as same as sep 1 and sep 2. So, maybe, using 15 days before and after the
mid-point which is date_diagnosis would make more sense.
Dataset wanted for final output:
id | amb_date | inp-date | date_diagnosis | nereast | source |
1 | 9/15/2011 | 9/15/2011 | 9/15/2011 | 9/15/2011 | both |
1 | 9/15/2011 | 9/1/2011 | 9/15/2011 | 9/15/2011 | ambulance |
1 | 9/30/2011 | 9/10/2011 | 9/15/2011 | 9/10/2011 | inpatient |
1 | 9/1/2011 | 9/28/2011 | 9/15/2011 | 9/28/2011 | inpatient |
This is the continuation of the post here:
but with slight change in the condition set for the selection of the date range.
The problem so far is solved with the condition meeting with same month and year in the link specified above thanks to FrankieNg and Oligolas. And I'll greatly appreciate if you help me figure the problem with the new condition for the date range. I was considering of using BETWEEN function in PROC SQL with the predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15).
Any hints, pointers, ideas or direct help in the code will be appreciated.Thanks in advance.
data ambulance_date;
input insurance_id amb_date;
cards;
1 16576
1 17616
1 17542
1 17814
1 17575
1 17541
1 17822
1 17703
1 18184
1 17532
1 17972
1 17935
2 16989
1 17265
2 16997
2 17008
1 18231
2 16997
2 17000
1 17056
1 16831
2 16997
2 17008
1 18162
1 18024
1 18276
1 17773
1 18299
1 18387
1 18609
1 18486
1 18486
1 18486
1 18486
1 18571
1 18488
1 18585
1 18601
1 17527
1 18722
1 18907
1 18869
1 18808
1 18760
1 18760
1 18760
1 18821
1 18903
1 18954
1 18136
2 16974
2 16974
2 16974
2 17010
1 17036
1 18455
1 18956
;
data inpatient_date;
input insurance_id inp_date;
cards;
1 16623
1 16692
1 16726
1 17712
1 16709
1 17541
1 17827
1 18110
1 17790
2 16974
2 16939
1 16952
2 16966
1 17036
2 16981
2 16974
2 16974
1 17616
1 17546
1 17240
1 17245
1 17689
2 16940
1 17895
2 17000
1 17736
1 17543
1 17672
1 18024
1 18175
1 17819
1 18184
1 18122
1 18231
1 17500
1 18299
2 16997
2 16997
2 16997
2 17008
2 16989
1 18695
1 18722
1 18722
1 18309
1 18309
1 18869
1 18836
1 18110
1 18110
1 18956
1 18321
1 18760
1 18299
1 18299
1 18299
1 18455
1 18455
1 18534
1 18857
1 18857
1 18760
1 18884
1 17988
1 17972
1 18895
;
data patient_id;
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884
PAT092 2 17061 16940
;
PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
MIN(amb_date,inp_date) AS NEAREST_DATE FORMAT=DATE9. /* select the nearest of the ambulance_date and inp_date to the date_diagnosis */
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id
AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(B.amb_date,DATE9.),3,9) /* find records in the ambulance data where admission date (ambulance_date) equals or in the same month and year with (date_diagnosis) */
AND A.date_diagnosis <= B.amb_date /* consider only if ambulance and inpatient dates happen after date-diagnosis */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id
AND SUBSTR(PUT(A.date_diagnosis,DATE9.),3,9) = SUBSTR(PUT(C.inp_date,DATE9.),3,9) /* find records in the inpatient data where date of hospitalization to inpatient dept (inp_date) equals or in the same month and year with (date_diagnosis) */
AND A.date_diagnosis <= C.inp_date /* consider only if ambulance and inpatient dates happen after date-diagnosis */
;
QUIT;
Hi Cruise,
When come to date range, we usually use INTNX function.
I've modified a bit of the data source, to show more observation from the output..
Hope this help.
data ambulance_date;
input insurance_id amb_date;
cards;
1 16576
1 17616
1 17542
1 17814
1 17575
1 17541
1 17822
1 17703
1 18184
1 17532
1 17972
1 17935
2 16989
1 17265
2 16997
2 17008
1 18231
2 16997
2 17000
1 17056
1 16831
2 16940
2 17008
1 18162
1 18024
1 18276
1 17773
1 18299
1 18387
1 18609
1 18486
1 18486
1 18486
1 18486
1 18571
1 18488
1 18585
1 18601
1 17527
1 18722
1 18907
1 18869
1 18808
1 18760
1 18760
1 18760
1 18821
1 18903
1 18954
1 18136
2 16925
2 16974
2 16955
2 17010
1 17036
1 18455
1 18956
;
data inpatient_date;
input insurance_id inp_date;
cards;
1 16623
1 16692
1 16726
1 17712
1 16709
1 17541
1 17827
1 18110
1 17790
2 16974
2 16939
1 16952
2 16950
1 17036
2 16981
2 16974
2 16930
1 17616
1 17546
1 17240
1 17245
1 17689
2 16940
1 17895
2 17000
1 17736
1 17543
1 17672
1 18024
1 18175
1 17819
1 18184
1 18122
1 18231
1 17500
1 18299
2 16997
2 16997
2 16997
2 17008
2 16939
1 18695
1 18722
1 18722
1 18309
1 18309
1 18869
1 18836
1 18110
1 18110
1 18956
1 18321
1 18760
1 18299
1 18299
1 18299
1 18455
1 18455
1 18534
1 18857
1 18857
1 18760
1 18884
1 17988
1 17972
1 18895
;
data patient_id;
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884
PAT092 2 17061 16940
;
%LET RANGE = 15;
PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
ABS(INTCK('DAY', date_diagnosis, amb_date)) AS AMB_RANGE,
ABS(INTCK('DAY', date_diagnosis, inp_date)) AS INP_RANGE,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN amb_date
WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN inp_date
WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN amb_date
END AS NEAREST_DATE FORMAT DATE9.,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN 'BOTH'
WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN 'INPATIENT'
WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN 'AMBULANCE'
END AS SOURCE FORMAT $15.
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id
AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= B.amb_date <= INTNX('DAY', A.date_diagnosis, &RANGE) /* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id
AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= C.inp_date <= INTNX('DAY', A.date_diagnosis, &RANGE) /* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
;
QUIT;
Your data appear to be sorted by id, but not date within id? Is that true. If your data are sorted by id then a data step solution may be a good deal faster than SQL.
Individual patients have multiple insurance id. However, once datasets are merged on the insurance_id (which is only linkage variable common across these three different datasets) then the final linked data should be sorted by pat_id. So, you're right, not sorted by date within pat_id.
do you have time to show me how in data step? maybe I should extend my sample datasets in this forum.
I will look at it later this evening if there is time then. Busy now.
Hi Cruise,
When come to date range, we usually use INTNX function.
I've modified a bit of the data source, to show more observation from the output..
Hope this help.
data ambulance_date;
input insurance_id amb_date;
cards;
1 16576
1 17616
1 17542
1 17814
1 17575
1 17541
1 17822
1 17703
1 18184
1 17532
1 17972
1 17935
2 16989
1 17265
2 16997
2 17008
1 18231
2 16997
2 17000
1 17056
1 16831
2 16940
2 17008
1 18162
1 18024
1 18276
1 17773
1 18299
1 18387
1 18609
1 18486
1 18486
1 18486
1 18486
1 18571
1 18488
1 18585
1 18601
1 17527
1 18722
1 18907
1 18869
1 18808
1 18760
1 18760
1 18760
1 18821
1 18903
1 18954
1 18136
2 16925
2 16974
2 16955
2 17010
1 17036
1 18455
1 18956
;
data inpatient_date;
input insurance_id inp_date;
cards;
1 16623
1 16692
1 16726
1 17712
1 16709
1 17541
1 17827
1 18110
1 17790
2 16974
2 16939
1 16952
2 16950
1 17036
2 16981
2 16974
2 16930
1 17616
1 17546
1 17240
1 17245
1 17689
2 16940
1 17895
2 17000
1 17736
1 17543
1 17672
1 18024
1 18175
1 17819
1 18184
1 18122
1 18231
1 17500
1 18299
2 16997
2 16997
2 16997
2 17008
2 16939
1 18695
1 18722
1 18722
1 18309
1 18309
1 18869
1 18836
1 18110
1 18110
1 18956
1 18321
1 18760
1 18299
1 18299
1 18299
1 18455
1 18455
1 18534
1 18857
1 18857
1 18760
1 18884
1 17988
1 17972
1 18895
;
data patient_id;
input pat_id $ insurance_id date_death date_diagnosis;
cards;
PAT009 1 18983 18884
PAT092 2 17061 16940
;
%LET RANGE = 15;
PROC SQL;
CREATE TABLE PAT_AMB_INP AS SELECT
A.pat_id,
A.insurance_id,
A.date_death AS date_death FORMAT=DATE9.,
A.date_diagnosis AS date_diagnosis FORMAT=DATE9.,
B.amb_date AS amb_date FORMAT=DATE9.,
C.inp_date AS inp_date FORMAT=DATE9.,
ABS(INTCK('DAY', date_diagnosis, amb_date)) AS AMB_RANGE,
ABS(INTCK('DAY', date_diagnosis, inp_date)) AS INP_RANGE,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN amb_date
WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN inp_date
WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN amb_date
END AS NEAREST_DATE FORMAT DATE9.,
CASE WHEN CALCULATED AMB_RANGE = CALCULATED INP_RANGE THEN 'BOTH'
WHEN CALCULATED AMB_RANGE > CALCULATED INP_RANGE THEN 'INPATIENT'
WHEN CALCULATED AMB_RANGE < CALCULATED INP_RANGE THEN 'AMBULANCE'
END AS SOURCE FORMAT $15.
FROM patient_id AS A
LEFT JOIN ambulance_date AS B ON A.insurance_id = B.insurance_id
AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= B.amb_date <= INTNX('DAY', A.date_diagnosis, &RANGE) /* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
LEFT JOIN inpatient_date AS C ON A.insurance_id = C.insurance_id
AND INTNX('DAY', A.date_diagnosis, -&RANGE) <= C.inp_date <= INTNX('DAY', A.date_diagnosis, &RANGE) /* predefined date range (BETWEEN date_diagnosis-15 and date_diagnosis+15) */
;
QUIT;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.