Using the LAG function when its value was originally missing

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

Using the LAG function when its value was originally missing

Hi there,

 

I have a data set containing the dose that each patient has received per visit. When the value has changed from previous the visit its value is included. If not, it is supposed to be the same and is not included. To make some calculations I would like to create a code that, when the dose is mantained (samedose = 'Y'),  the previous dose is assigned to the actual register.

 

Let me use a small example to explain what I mean. Starting from this data set:

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $;
	datalines;
001,0,7.4, 
001,1,.,Y
001,2,.,Y
001,3,.,Y
002,0,3.7, 
002,1,2.3,N
002,2,.,Y
002,3,.,Y
003,0,5.4, 
003,1,.,Y
003,2,2.7,N
003,3,.,Y
004,0,5.4, 
004,1,3.2,N
004,2,.,Y
004,3,4.8,N
;
run;


start.PNG

I would like to obtain the following completed data set:

 

end.PNG

 

For this purpose I thought that the LAG function could be a good option. However, it is behaving unexpectedly:

  1. It only seems to work if I assign the value to an auxiliary variable (lagdose) before. If I try to do it direcly it doesn't work.
  2. It seems to be calculating the LAG values at the beginning of the execution, so values that originally were missing, even if they are later assinged to a non-missing one during execution, remain missing. I thought this function worked per register, so the LAG value is calculated AFTER the previous one is assinged, but apparently it is not. Is this right?

 

This code is not working at all:

 

data sample_lag_output;
	set sample_data;
	if samedose = 'Y' then dose = lag1(dose);
run;

 This code is only assigning properly only the values whose previous value was originally non-missing:

 

data sample_lag_output;
	set sample_data;
	lagdose = lag1(dose);
	if samedose = 'Y' then dose = lagdose;
run;
Obtaining the following uncomplete output instead of the expected one:
 
uncomplete.PNG
 
I've read about some issues about the LAG function and conditional structures, but what I understood is that it can't be part of the condition, but it can be changed conditionally as in my example.
 
What am I doing wrong? Is LAG not suitable for this transformation?
 
Thank you for your help in advance!

Accepted Solutions
Solution
‎05-21-2018 03:16 AM
Super User
Posts: 10,778

Re: Using the LAG function when its value was originally missing

That is easy.

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $ withdrawn $;
	datalines;
001,0,7.4, ,
001,1,.,Y, 
001,2,.,Y,
001,3,.,Y,
002,0,3.7, , 
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, , 
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, , 
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
data want;
 update sample_data(obs=0) sample_data;
 by pt;
 lag=lag(dose);
 if withdrawn='Y' then call missing(dose,samedose);
 output;
run;

View solution in original post


All Replies
Super User
Posts: 10,778

Re: Using the LAG function when its value was originally missing

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $;
	datalines;
001,0,7.4, 
001,1,.,Y
001,2,.,Y
001,3,.,Y
002,0,3.7, 
002,1,2.3,N
002,2,.,Y
002,3,.,Y
003,0,5.4, 
003,1,.,Y
003,2,2.7,N
003,3,.,Y
004,0,5.4, 
004,1,3.2,N
004,2,.,Y
004,3,4.8,N
;
run;
data want;
 update sample_data(obs=0) sample_data;
 by pt;
 lag=lag(dose);
 output;
run;
Contributor
Posts: 46

Re: Using the LAG function when its value was originally missing

Dear @Ksharp,

 

Thank you for your quick response. Although your code is working for this simple example, I don't know if it will fit my needs for the full data set on which some cells are expected to remain missing if samedose ne 'Y'. From what I see, you are not using any conditional structure in your code to force the LAG value to be used only when samedose='Y'. Or maybe you are including this feature and I'm not fully understanding your code. 

 

Could you please explain a little bit more what is the purpose of each line?

 

Thank you very much in advance! 

Super User
Posts: 10,778

Re: Using the LAG function when its value was originally missing

"some cells are expected to remain missing if samedose ne 'Y'. From what I see, you are not using any conditional structure in your code to force the LAG value to be used only when samedose='Y'. "

 

I don't understand what you mean. An example is best to explain question.

Contributor
Posts: 46

Re: Using the LAG function when its value was originally missing

You're right, @Ksharp, I tried to simplify as much as possible the input sample and on the way I lost some of its features.

 

Let me introduce a new variable called "withdrawn" which turns to Y if the patient is withdrawn from the study before completing the 4 scheduled visits (0, 1, 2, 3). In that case, the dose value should remain missing for visits to which the patient did not attend. Imagine this modified sample data set on which patients 003 and 004 have an early termination:

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $ withdrawn $;
	datalines;
001,0,7.4, ,
001,1,.,Y, 
001,2,.,Y,
001,3,.,Y,
002,0,3.7, , 
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, , 
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, , 
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;

In this case when I run your code I obtain this result on which I have highlighted the unwanted results:

 

withdrawn.PNG

 

Could your code be modified to include this?

 

Thanks for your patience!

Solution
‎05-21-2018 03:16 AM
Super User
Posts: 10,778

Re: Using the LAG function when its value was originally missing

That is easy.

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $ withdrawn $;
	datalines;
001,0,7.4, ,
001,1,.,Y, 
001,2,.,Y,
001,3,.,Y,
002,0,3.7, , 
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, , 
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, , 
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
data want;
 update sample_data(obs=0) sample_data;
 by pt;
 lag=lag(dose);
 if withdrawn='Y' then call missing(dose,samedose);
 output;
run;
Super User
Posts: 10,239

Re: Using the LAG function when its value was originally missing

Using lag() in a conditional branch is always dangerous, as lag() feeds its FIFO chain only when it is called, and it puts the current value into it, so you will at one point propagate a missing value if two or more appear in succession. Use a retained variable instead:

data want;
set sample_data;
by pt;
retain _dose;
if first.pt then _dose = .;
if dose ne .
then _dose = dose;
else dose = _dose;
drop _dose;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 46

Re: Using the LAG function when its value was originally missing

Posted in reply to KurtBremser

Dear @KurtBremser,

 

I thought about doing it with the retain function as you are suggesting, but now that I've tried it with LAG, I want to get to understand how this function works for future applications. 

 

Thank you very much for your quick answer!

PROC Star
Posts: 2,353

Re: Using the LAG function when its value was originally missing

As @KurtBremser said, the lag() function only updates its value when called.

Since you don't always call it by using it inside a test

if samedose = 'Y' then dose = lag1(dose);

it misses some updates.

Using an extraneous variable as you did is the proper way to do what you want.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 142 views
  • 0 likes
  • 4 in conversation