Based on the example data @ChrisHemedinger provided (I only changed the date formats), I came up with this:
data want;
set readmit;
set readmit (
firstobs=2
keep=pat_ur_num admission_date_id
rename=(
pat_ur_num=_pat
admission_date_id=_ad_date
)
);
lagpat = lag(pat_ur_num);
format start yymmddd10.;
retain start;
if pat_ur_num ne lagpat or intck('month',start,admission_date_id,'c') >= 3
then start = admission_date_id;
if pat_ur_num = _pat and intck('month',start,_ad_date,'c') <= 3
then do;
days = discharge_date_id - start;
output;
end;
else if lagpat = pat_ur_num and intck('month',start,admission_date_id,'c') <= 3 and start ne admission_date_id
then do;
days = discharge_date_id - start;
output;
end;
drop lagpat _pat _ad_date start;
run;
proc print data=want noobs;
run;
The result:
EPISODE_ ADMISSION_ ADMISSION_ PAT_ DISCHARGE_ DRG_CODE_ ID NUM DATE_ID UR_NUM DATE_ID VER_042 days 28-104840 104840 2015-08-31 5933 2015-09-25 U66Z 25 28-106124 106124 2015-09-29 5933 2015-10-22 U66Z 52 28-79790 79790 2014-04-03 6709 2014-05-04 U66Z 31 28-82421 82421 2014-05-16 6709 2014-05-17 U66Z 44 28-83433 83433 2014-06-05 6709 2014-06-13 U66Z 71 28-85484 85484 2014-07-08 6709 2014-08-20 U66Z 43 28-88149 88149 2014-08-29 6709 2014-09-12 U66Z 66
matches your wanted output:
| OUTPUT | ||||||||
| EPISODE_ID | ADMISSION_NUM | ADMISSION_DATE_ID | PAT_UR_NUM | DISCHARGE_DATE_ID | DAYS | DRG_CODE_VER_042 | RE-ADMISSION COUNT IF WITHIN 3 MONTHS | |
| 28-104840 | 104840 | 2015-08-31 | 5933 | 2015-09-25 | 25 | U66Z | 0 | |
| 28-106124 | 106124 | 2015-09-29 | 5933 | 2015-10-22 | 52 | U66Z | 1 | |
| 28-79790 | 79790 | 2014-04-03 | 6709 | 2014-05-04 | 31 | U66Z | 1 | |
| 28-82421 | 82421 | 2014-05-16 | 6709 | 2014-05-17 | 44 | U66Z | 1 | |
| 28-83433 | 83433 | 2014-06-05 | 6709 | 2014-06-13 | 71 |
I omitted the readmission_count, as it is inconsistent (why is it zero in the first line for 5933, but 1 in the first line for 6709?)
OK, @ChrisHemedinger has done the time-consuming part, providing proper data for the problem. The solution is short and sweet.
Given your data is sorted by admission date within patient id, all you have to do is compare admission date in one observation with discharge date in the prior obs. And also delete when the current obs are is the start of a new patient.
/* --------------------------------------------------------------------
Code generated by a SAS task
-------------------------------------------------------------------- */
/* --------------------------------------------------------------------
This DATA step reads the data values from DATALINES within the SAS
code. The values within the DATALINES were extracted from the text
source file by the Import Data wizard.
-------------------------------------------------------------------- */
DATA WORK.READMIT;
LENGTH
EPISODE_ID $ 9
ADMISSION_NUM 8
ADMISSION_DATE_ID 8
PAT_UR_NUM 8
DISCHARGE_DATE_ID 8
DRG_CODE_VER_042 $ 4 ;
FORMAT
EPISODE_ID $CHAR9.
ADMISSION_NUM BEST6.
ADMISSION_DATE_ID DATE9.
PAT_UR_NUM BEST4.
DISCHARGE_DATE_ID DATE9.
DRG_CODE_VER_042 $CHAR4. ;
INFORMAT
EPISODE_ID $CHAR9.
ADMISSION_NUM BEST6.
ADMISSION_DATE_ID DATE9.
PAT_UR_NUM BEST4.
DISCHARGE_DATE_ID DATE9.
DRG_CODE_VER_042 $CHAR4. ;
INFILE DATALINES4
DLM=','
MISSOVER
DSD ;
INPUT
EPISODE_ID : $CHAR9.
ADMISSION_NUM : ?? BEST6.
ADMISSION_DATE_ID : ?? DATE9.
PAT_UR_NUM : ?? BEST4.
DISCHARGE_DATE_ID : ?? DATE9.
DRG_CODE_VER_042 : $CHAR4. ;
DATALINES4;
28-60030,60030,15FEB2013,124,08MAR2013,U66Z
28-142929,142929,04JUL2018,1442,20AUG2018,U66Z
28-124021,124021,30JAN2017,5650,16MAR2017,U66Z
28-104840,104840,31AUG2015,5933,25SEP2015,U66Z
28-106124,106124,29SEP2015,5933,22OCT2015,U66Z
28-129352,129352,16JUN2017,6068,21AUG2017,U66Z
28-57751,57751,10JAN2013,6170,31JAN2013,U66Z
28-66130,66130,13JUN2013,6184,28JUN2013,U66Z
28-59972,59972,14FEB2013,6349,18MAR2013,U66Z
28-61234,61234,07MAR2013,6385,20MAR2013,U66Z
28-63178,63178,12APR2013,6427,13APR2013,U66Z
11-488117,488117,11NOV2017,6527,09DEC2017,U66Z
11-515347,515347,29MAY2018,6527,22JUN2018,U66Z
28-68397,68397,05AUG2013,6557,16AUG2013,U66Z
28-110635,110635,22JAN2016,6648,09FEB2016,U66Z
28-79790,79790,03APR2014,6709,04MAY2014,U66Z
28-82421,82421,16MAY2014,6709,17MAY2014,U66Z
28-83433,83433,05JUN2014,6709,13JUN2014,U66Z
28-85484,85484,08JUL2014,6709,20AUG2014,U66Z
28-88149,88149,29AUG2014,6709,12SEP2014,U66Z
28-72646,72646,13NOV2013,6732,22NOV2013,U66Z
28-73012,73012,20NOV2013,6750,19DEC2013,U66Z
28-73191,73191,21NOV2013,6753,25NOV2013,U66Z
;;;;
/* Keep only readmissions less than 90 days after prior discharge */
data want;
set readmit;
by PAT_UR_NUM ADMISSION_DATE_ID ;
days=admission_date_id-lag(discharge_date_id);
if first.pat_ur_num=0 and days < 90;
run;
Hi,
Thanks heaps for the immense help.
As a novice, I just had one question what is the main reason for data step? My data came from a filter query I ran on the data from the SAS (huge data set).
Your guidance would be highly appreciated.
Cheers
Nikhil
The data step is THE Swiss Army Knife in SAS for data preparation. While most (if not all) the heavy-duty statistic and reporting work is done with dedicated procedures, the data step provides you with a turing-complete programming language that can solve any problem you can't deal with otherwise.
Besides handling existing data, you can use it to create sample data (as demonstrated here), or make complex calculations (eg to prepare macro variables), create dynamic code (call execute) or write SAS data to external files.
PS note that the result of the code you marked as solution does not match your requirements as originally stated:
EPISODE_ ADMISSION_ ADMISSION_ PAT_ DISCHARGE_ DRG_CODE_ ID NUM DATE_ID UR_NUM DATE_ID VER_042 days 28-106124 106124 29SEP2015 5933 22OCT2015 U66Z 4 28-82421 82421 16MAY2014 6709 17MAY2014 U66Z 12 28-83433 83433 05JUN2014 6709 13JUN2014 U66Z 19 28-85484 85484 08JUL2014 6709 20AUG2014 U66Z 25 28-88149 88149 29AUG2014 6709 12SEP2014 U66Z 9
Your feedback is highly appreciated.
So that means all data directly from SAS needs to go through DATA STEP?
What could the end solution be?
Thanks
@Nikhil28 wrote:
Your feedback is highly appreciated.
So that means all data directly from SAS needs to go through DATA STEP?
What could the end solution be?
Thanks
The task at hand will always define the tools needed (also see Maxim 14). Since the data step is the most versatile tool that SAS offers, it will be the most widely used.
In order to make the decision which tool to use, you need thorough knowledge of the tools in your toolbox. Therefore, starting as a SAS programmer has to be an in-depth study of the data step, its inner workings, its capabilities, and its limits.
IIRC, I spent my whole first week of SAS programming course with the data step and the global statements like libname, filename. Add a little proc print etc into the mix to look at the data, but most of the time was data step up-down-left-right-forward-back.
That's also why I told you that making up example data with datalines is such a useful skill. And just as a side effect, you will learn how to read external data.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.