Re: Use Retain to correct values based on the previous record

Solph · Posted 03-06-2017 09:15 PM

I have vertical data recording a person 's in and out of hospital and reason for leaving. I'm trying to fix people who have multiple deaths. The data is vertical: A person (PID) moved from one hospital to another (HospID), and within the same hospital, multiple records to show people's leaving and returning. See below for a sample of 5 cases. Also the actual data is date, not year, and there is no date comflict).

Naturally,
1. a person can only die once (so records for ID 3 & 4 are wrong)
2. if there is subsequent records after death, death should reset as unknown (DK). (so record for #5 is wrong).

So ID 1 and 2 are OK. The ^1st record of ID 3^, the ^2nd record of ID 4^ and the ^2nd record of ID 5^ should be set as DK.
I was thinking to use lag function but it doesn't do the work (or can be messy).
It seems I should use retain to retain value from the previous record (aftert sorting the data and set discharge year missing to be say 2019) and then reset the value as I move from one record to the next. But how do I execute?

Thanks in advance for any help you can give me.

data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015
;
run;
proc print noobs; run;

LaurieF · Posted 03-06-2017 09:30 PM

Well, this does what you are asking for:

data want;
set have;
by pid;
if not last.pid and reason_to_leave = 'death' then
   reason_to_leave = 'DK';
run;

Is it as simple as that?

Ksharp · Posted 03-06-2017 09:52 PM

Better post the output too.

 
data have; input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015
;
run;
proc sort data=have;
 by pid descending discharge_year;
run;
data want;
 set have;
 by pid;
 retain found;
 if first.pid then found=0;
 if reason_to_leave='death' and found then reason_to_leave='DK';
 if reason_to_leave='death' then found=1;
 drop found;
run;
proc sort data=want;
 by pid discharge_year;
run;

art297 · Posted 03-06-2017 09:56 PM

I'd approach the problem as follows:

data have;
  input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
  datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015 .    .
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015 .    .
;
run;

proc sort data=have;
  by pid discharge_year;
run;

data want;
  do until (last.pid);
    set have;
    by pid;
    if reason_to_leave eq 'death' then latest_death=discharge_year;
  end;
  do until (last.pid);
    set have;
    by pid;
    if reason_to_leave eq 'death' and discharge_year ne latest_death then reason_to_leave='DK';
    output;
  end;
run;

HTH,

Art, CEO, AnalystFinder.com

mkeintz · Posted 03-07-2017 01:19 AM

@Solph wrote:

I have vertical data recording a person 's in and out of hospital and reason for leaving. I'm trying to fix people who have multiple deaths.

and later wrote

Naturally,
1. a person can only die once (so records for ID 3 & 4 are wrong)
2. if there is subsequent records after death, death should reset as unknown (DK). (so record for #5 is wrong).

Occurance of multiple deaths are just subsets of rule 2 above. It seems to me that @LaurieF's response fits the requirements exactly. I understand the problem to mean that a death record is only acceptable as the final record. All other death records should be flagged.

c

x

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 03-07-2017 09:41 PM

Or use DOW + index skill, something like Arthur did.


data have;
  input PID 1 HospID $ 3 admission_year 5-8 discharge_year 10-13 reason_to_leave $ 15-21;
  datalines;
1 A 2012 2013 tempout
1 A 2013 2014 moved
1 B 2015 2016 death
2 A 2012 2013 tempout
2 A 2013 2014 left
2 B 2015 .    .
3 A 2012 2013 death
3 A 2013 2014 moved
3 B 2015 2016 death
4 A 2012 2013 moved
4 B 2013 2014 death
4 B 2014 2015 tempout
4 B 2015 2016 death
5 A 2012 2014 tempout
5 A 2015 2016 death
5 B 2013 2014 moved
5 C 2015 .    .
;
run;


data want;
  do i=1 by 1 until (last.pid);
    set have;
    by pid;
    if reason_to_leave eq 'death' then n=i;
  end;
  do j=1 by 1 until (last.pid);
    set have;
    by pid;
    if reason_to_leave eq 'death' and j lt n then reason_to_leave='DK';
    output;
  end;
  drop i j n;
run;

LaurieF · Posted 03-07-2017 09:43 PM

But - but - why make it so hard?

Ksharp · Posted 03-07-2017 10:21 PM


Maybe OP can clarify more about what he want.

PID=5 is different between these code.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away