I have vertical data recording a person 's in and out of hospital and reason for leaving. I'm trying to fix people who have multiple deaths. The data is vertical: A person (PID) moved from one hospital to another (HospID), and within the same hospital, multiple records to show people's leaving and returning. See below for a sample of 5 cases. Also the actual data is date, not year, and there is no date comflict).
Naturally,
1. a person can only die once (so records for ID 3 & 4 are wrong)
2. if there is subsequent records after death, death should reset as unknown (DK). (so record for #5 is wrong).
So ID 1 and 2 are OK. The ^1st record of ID 3^, the ^2nd record of ID 4^ and the ^2nd record of ID 5^ should be set as DK.
I was thinking to use lag function but it doesn't do the work (or can be messy).
It seems I should use retain to retain value from the previous record (aftert sorting the data and set discharge year missing to be say 2019) and then reset the value as I move from one record to the next. But how do I execute?
Thanks in advance for any help you can give me.
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015
;
run;
proc print noobs; run;
Well, this does what you are asking for:
data want;
set have;
by pid;
if not last.pid and reason_to_leave = 'death' then
reason_to_leave = 'DK';
run;
Is it as simple as that?
Better post the output too.
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015
;
run;
proc sort data=have;
by pid descending discharge_year;
run;
data want;
set have;
by pid;
retain found;
if first.pid then found=0;
if reason_to_leave='death' and found then reason_to_leave='DK';
if reason_to_leave='death' then found=1;
drop found;
run;
proc sort data=want;
by pid discharge_year;
run;
I'd approach the problem as follows:
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21; datalines; 1 A 2012 2013 tempout 1 A 2013 2014 moved 1 B 2015 2016 death 2 A 2012 2013 tempout 2 A 2013 2014 left 2 B 2015 . . 3 A 2012 2013 death 3 A 2013 2014 moved 3 B 2015 2016 death 4 A 2012 2013 moved 4 B 2013 2014 death 4 B 2014 2015 tempout 4 B 2015 2016 death 5 A 2012 2014 tempout 5 A 2015 2016 death 5 B 2013 2014 moved 5 C 2015 . . ; run; proc sort data=have; by pid discharge_year; run; data want; do until (last.pid); set have; by pid; if reason_to_leave eq 'death' then latest_death=discharge_year; end; do until (last.pid); set have; by pid; if reason_to_leave eq 'death' and discharge_year ne latest_death then reason_to_leave='DK'; output; end; run;
HTH,
Art, CEO, AnalystFinder.com
@Solph wrote:
I have vertical data recording a person 's in and out of hospital and reason for leaving. I'm trying to fix people who have multiple deaths.
and later wrote
Naturally,
1. a person can only die once (so records for ID 3 & 4 are wrong)
2. if there is subsequent records after death, death should reset as unknown (DK). (so record for #5 is wrong).
Occurance of multiple deaths are just subsets of rule 2 above. It seems to me that @LaurieF's response fits the requirements exactly. I understand the problem to mean that a death record is only acceptable as the final record. All other death records should be flagged.
c
x
Or use DOW + index skill, something like Arthur did. data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21; datalines; 1 A 2012 2013 tempout 1 A 2013 2014 moved 1 B 2015 2016 death 2 A 2012 2013 tempout 2 A 2013 2014 left 2 B 2015 . . 3 A 2012 2013 death 3 A 2013 2014 moved 3 B 2015 2016 death 4 A 2012 2013 moved 4 B 2013 2014 death 4 B 2014 2015 tempout 4 B 2015 2016 death 5 A 2012 2014 tempout 5 A 2015 2016 death 5 B 2013 2014 moved 5 C 2015 . . ; run; data want; do i=1 by 1 until (last.pid); set have; by pid; if reason_to_leave eq 'death' then n=i; end; do j=1 by 1 until (last.pid); set have; by pid; if reason_to_leave eq 'death' and j lt n then reason_to_leave='DK'; output; end; drop i j n; run;
But - but - why make it so hard?
Maybe OP can clarify more about what he want. PID=5 is different between these code.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.