DATA Step, Macro, Functions and more

RETAIN question

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

RETAIN question

Hi there, I am interested in modifying my dataset in a way that I believe needs a RETAIN function. But it is not yielding what I want. I have a series of observations that represent a doctor visit. One of the variables is a patient ID number (patient_ID). If the same patient ID number occurs under multiple observations, then the same person has had multiple visits to the doctor. There is also a variable that identifies the date the visit occured(date). There is a 3rd variable that indicates if the visit was a 5:30 or after visit(evening),a (1,0) variable where 1 indicates a visit at or after 5:30.

I've sorted the observations by ascending patient ID number and then ascending visit date. I am interested only in the visit that occurs after 5:30, and any visits for that patient thereafter. So if the first two visits occur at 3:00 and 4:00, and the 3rd 4th and 5th at 5:45, 2:00, 12:00 respectably, I am only interested in the 3rd 4th and 5th visits.I want to "mark" these visits by creating a new variable that =1 for these observations (=0 for the non-relevant observations). I'm thinking a RETAIN function is the way to go, but doesn't seem to be working:

data new;

set old;

by patient_id date;

retain indicator;

if first.patient_id and evening=1 then do;

indicator = 1 ;

end;

else if evening = 1 then do;

indicator2=1;

end;

run; 

I would think this code would carry down, or RETAIN, the 1 to the next data line. Instead, it seems to reset to 0. The only observations in where indicator=1 are those observations where the evening=1, even though all those subsequent visits for that patient should be marked as 1....

Hope it is clear, let me know if it ain't!

TM


Accepted Solutions
Solution
‎02-16-2015 04:46 PM
Super User
Super User
Posts: 7,079

Re: RETAIN question

Posted in reply to FrankReynolds

Again the ONLY way that happens with the code posted is if the INDICATOR variable is ALREADY defined in the input dataset OLD.

Here is your data.

data old;

  input patient_id date evening ;

  informat date mmddyy10. ;

  format date yymmdd10.;

cards;

1000                    1.24.10        0                       0

1000                    2.20.10        0                       0

1000                    3.30.10        0                       0

1000                    5.06.10        1                       1

1000                    6.11.10        0                       0

;;;;

data new ;

  set old ;

  by patient_id date ;

  retain indicator ;

  if first.patient_id then indicator=0;

  if evening=1 then indicator=1;

  put (_all_) (=);

run;

patient_id=1000 date=2010-01-24 evening=0 indicator=0

patient_id=1000 date=2010-02-20 evening=0 indicator=0

patient_id=1000 date=2010-03-30 evening=0 indicator=0

patient_id=1000 date=2010-05-06 evening=1 indicator=1

patient_id=1000 date=2010-06-11 evening=0 indicator=1

View solution in original post


All Replies
SAS Employee
Posts: 340

Re: RETAIN question

Posted in reply to FrankReynolds

In you code indicator becomes 1 only if the first visit of a patient is evening. Indicator2 becomes 1 if there is an evening visit but not the first one. But evening2 is not retained!

What  about this:

data new;

set old;

by patient_id date;

retain indicator;

if first.patient_id then do;

indicator = 0;/*reseting indicator at every group beginning*/

end;

if evening=1 then do;

  indicator=1;

end;

run;

Occasional Contributor
Posts: 10

Re: RETAIN question

Posted in reply to gergely_batho

Hi Gergely,

Thanks for the response. There is a typo in the code above - there should only be an 'indicator' variable, not 'indicator2'...Whenever I try to edit it, it bugs out so I'm going to let it be.

Let me try out your approach and see what happens...


Occasional Contributor
Posts: 10

Re: RETAIN question

Posted in reply to gergely_batho

It doesn't work. It sets indicator=0 for the first observation, and any subsequent observation to 1, whether evening=1 or not. I'm guessing it is because of the lines-

if first.patient_id then do;

indicator = 0;/*reseting indicator at every group beginning*/

It is indeed making indicator=0 if it is the first. Therefore, if it is not the first, it =1 by default.

Super Contributor
Super Contributor
Posts: 3,174

Re: RETAIN question

Posted in reply to FrankReynolds

Recommend adding PUTLOG '>DIAG-nnn' / _all_;   statements at various points in the DATA step to reveal just what you are getting condition-wise with not only your flag variables but also your BY variable processing.  That will help with desk-checking your DATA step flow.

Super User
Super User
Posts: 7,079

Re: RETAIN question

Posted in reply to FrankReynolds

There is nothing in your code that could set INDICATOR=0.  If you are seeing records where INDICATOR=0 then it must have been on you input dataset.

You need to either drop that old INDICATOR variable or use a new name for the new variable.

You cannot "RETAIN" a variable that is on the input data set because every time the SET statement executes it will read the value from the data set and not carry forward the value from the previous iteration of the data step.

Occasional Contributor
Posts: 10

Re: RETAIN question

if first.patient_id and evening=1 then do;

indicator = 1 ;

end;

Wouldn't this make any first observation where evening not equal to 0, have an indicator=0, or atleast '.' ?

'Indicator' was not in the input data set, I believe I created it with this new dataset??

Super User
Super User
Posts: 7,079

Re: RETAIN question

Posted in reply to FrankReynolds
 I am interested only in the visit that occurs after 5:30, and any visits for that patient thereafter. 


You need to reset the flag when you start a new patient.


data new;

  set old;

  by patient_id date;

  retain indicator;

  if first.patient_id then indicator=0;

  if evening = 1 then indicator=1;

run;

Occasional Contributor
Posts: 10

Re: RETAIN question

I understand your point, and it seems logical to me, but for some reason, the indicator only =1 for that specific observatin in where evening=1. FYI, when I write "any visits for that patient thereafter", I don't mean only subsequent visits after 5:30. These visits could be any time, and date, as long as they occur after that specific evening=1 observation. Let me give you an example of my output when I use your code, for the first patients set of visits

PATIENT_ID         DATE         EVENING         INDICATOR

1000                    1.24.10        0                       0

1000                    2.20.10        0                       0

1000                    3.30.10        0                       0

1000                    5.06.10        1                       1

1000                    6.11.10        0                       0

The problem is that the last observation's indicator variable should =1, because it is a visit on a date that took place after the evening visit date. It should read:

PATIENT_ID         DATE         EVENING         INDICATOR

1000                    1.24.10        0                       0

1000                    2.20.10        0                       0

1000                    3.30.10        0                       0

1000                    5.06.10        1                       1

1000                    6.11.10        0                       1

Solution
‎02-16-2015 04:46 PM
Super User
Super User
Posts: 7,079

Re: RETAIN question

Posted in reply to FrankReynolds

Again the ONLY way that happens with the code posted is if the INDICATOR variable is ALREADY defined in the input dataset OLD.

Here is your data.

data old;

  input patient_id date evening ;

  informat date mmddyy10. ;

  format date yymmdd10.;

cards;

1000                    1.24.10        0                       0

1000                    2.20.10        0                       0

1000                    3.30.10        0                       0

1000                    5.06.10        1                       1

1000                    6.11.10        0                       0

;;;;

data new ;

  set old ;

  by patient_id date ;

  retain indicator ;

  if first.patient_id then indicator=0;

  if evening=1 then indicator=1;

  put (_all_) (=);

run;

patient_id=1000 date=2010-01-24 evening=0 indicator=0

patient_id=1000 date=2010-02-20 evening=0 indicator=0

patient_id=1000 date=2010-03-30 evening=0 indicator=0

patient_id=1000 date=2010-05-06 evening=1 indicator=1

patient_id=1000 date=2010-06-11 evening=0 indicator=1

Occasional Contributor
Posts: 10

Re: RETAIN question

Right...I'm starting to think it is something funky with the old dataset too. I've found a way around it for now, but I will go back and try again...More curious than anything.

Thanks Tom.

Occasional Contributor
Posts: 10

Re: RETAIN question

That worked! Thank you...

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 412 views
  • 0 likes
  • 4 in conversation