I have vertical data recording a person 's in and out of hospital and reason for leaving. I'm trying to fix people who have multiple deaths. The data is vertical: A person (PID) moved from one hospital to another (HospID), and within the same hospital, multiple records to show people's leaving and returning. See below for a sample of 5 cases. Also the actual data is date, not year, and there is no date comflict).
Naturally,
1. a person can only die once (so records for ID 3 & 4 are wrong)
2. if there is subsequent records after death, death should reset as unknown (DK). (so record for #5 is wrong).
So ID 1 and 2 are OK. The ^1st record of ID 3^, the ^2nd record of ID 4^ and the ^2nd record of ID 5^ should be set as DK.
I was thinking to use lag function but it doesn't do the work (or can be messy).
It seems I should use retain to retain value from the previous record (aftert sorting the data and set discharge year missing to be say 2019) and then reset the value as I move from one record to the next. But how do I execute?
Thanks in advance for any help you can give me.
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015
;
run;
proc print noobs; run;
Well, this does what you are asking for:
data want;
set have;
by pid;
if not last.pid and reason_to_leave = 'death' then
reason_to_leave = 'DK';
run;
Is it as simple as that?
Better post the output too.
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015
;
run;
proc sort data=have;
by pid descending discharge_year;
run;
data want;
set have;
by pid;
retain found;
if first.pid then found=0;
if reason_to_leave='death' and found then reason_to_leave='DK';
if reason_to_leave='death' then found=1;
drop found;
run;
proc sort data=want;
by pid discharge_year;
run;
I'd approach the problem as follows:
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21; datalines; 1 A 2012 2013 tempout 1 A 2013 2014 moved 1 B 2015 2016 death 2 A 2012 2013 tempout 2 A 2013 2014 left 2 B 2015 . . 3 A 2012 2013 death 3 A 2013 2014 moved 3 B 2015 2016 death 4 A 2012 2013 moved 4 B 2013 2014 death 4 B 2014 2015 tempout 4 B 2015 2016 death 5 A 2012 2014 tempout 5 A 2015 2016 death 5 B 2013 2014 moved 5 C 2015 . . ; run; proc sort data=have; by pid discharge_year; run; data want; do until (last.pid); set have; by pid; if reason_to_leave eq 'death' then latest_death=discharge_year; end; do until (last.pid); set have; by pid; if reason_to_leave eq 'death' and discharge_year ne latest_death then reason_to_leave='DK'; output; end; run;
HTH,
Art, CEO, AnalystFinder.com
@Solph wrote:
I have vertical data recording a person 's in and out of hospital and reason for leaving. I'm trying to fix people who have multiple deaths.
and later wrote
Naturally,
1. a person can only die once (so records for ID 3 & 4 are wrong)
2. if there is subsequent records after death, death should reset as unknown (DK). (so record for #5 is wrong).
Occurance of multiple deaths are just subsets of rule 2 above. It seems to me that @LaurieF's response fits the requirements exactly. I understand the problem to mean that a death record is only acceptable as the final record. All other death records should be flagged.
c
x
Or use DOW + index skill, something like Arthur did. data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21; datalines; 1 A 2012 2013 tempout 1 A 2013 2014 moved 1 B 2015 2016 death 2 A 2012 2013 tempout 2 A 2013 2014 left 2 B 2015 . . 3 A 2012 2013 death 3 A 2013 2014 moved 3 B 2015 2016 death 4 A 2012 2013 moved 4 B 2013 2014 death 4 B 2014 2015 tempout 4 B 2015 2016 death 5 A 2012 2014 tempout 5 A 2015 2016 death 5 B 2013 2014 moved 5 C 2015 . . ; run; data want; do i=1 by 1 until (last.pid); set have; by pid; if reason_to_leave eq 'death' then n=i; end; do j=1 by 1 until (last.pid); set have; by pid; if reason_to_leave eq 'death' and j lt n then reason_to_leave='DK'; output; end; drop i j n; run;
But - but - why make it so hard?
Maybe OP can clarify more about what he want. PID=5 is different between these code.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.