BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
emera86
Quartz | Level 8

Hi there,

 

I have a data set containing the dose that each patient has received per visit. When the value has changed from previous the visit its value is included. If not, it is supposed to be the same and is not included. To make some calculations I would like to create a code that, when the dose is mantained (samedose = 'Y'),  the previous dose is assigned to the actual register.

 

Let me use a small example to explain what I mean. Starting from this data set:

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $;
	datalines;
001,0,7.4, 
001,1,.,Y
001,2,.,Y
001,3,.,Y
002,0,3.7, 
002,1,2.3,N
002,2,.,Y
002,3,.,Y
003,0,5.4, 
003,1,.,Y
003,2,2.7,N
003,3,.,Y
004,0,5.4, 
004,1,3.2,N
004,2,.,Y
004,3,4.8,N
;
run;


start.PNG

I would like to obtain the following completed data set:

 

end.PNG

 

For this purpose I thought that the LAG function could be a good option. However, it is behaving unexpectedly:

  1. It only seems to work if I assign the value to an auxiliary variable (lagdose) before. If I try to do it direcly it doesn't work.
  2. It seems to be calculating the LAG values at the beginning of the execution, so values that originally were missing, even if they are later assinged to a non-missing one during execution, remain missing. I thought this function worked per register, so the LAG value is calculated AFTER the previous one is assinged, but apparently it is not. Is this right?

 

This code is not working at all:

 

data sample_lag_output;
	set sample_data;
	if samedose = 'Y' then dose = lag1(dose);
run;

 This code is only assigning properly only the values whose previous value was originally non-missing:

 

data sample_lag_output;
	set sample_data;
	lagdose = lag1(dose);
	if samedose = 'Y' then dose = lagdose;
run;
Obtaining the following uncomplete output instead of the expected one:
 
uncomplete.PNG
 
I've read about some issues about the LAG function and conditional structures, but what I understood is that it can't be part of the condition, but it can be changed conditionally as in my example.
 
What am I doing wrong? Is LAG not suitable for this transformation?
 
Thank you for your help in advance!
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

That is easy.

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $ withdrawn $;
	datalines;
001,0,7.4, ,
001,1,.,Y, 
001,2,.,Y,
001,3,.,Y,
002,0,3.7, , 
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, , 
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, , 
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
data want;
 update sample_data(obs=0) sample_data;
 by pt;
 lag=lag(dose);
 if withdrawn='Y' then call missing(dose,samedose);
 output;
run;

View solution in original post

8 REPLIES 8
Ksharp
Super User
data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $;
	datalines;
001,0,7.4, 
001,1,.,Y
001,2,.,Y
001,3,.,Y
002,0,3.7, 
002,1,2.3,N
002,2,.,Y
002,3,.,Y
003,0,5.4, 
003,1,.,Y
003,2,2.7,N
003,3,.,Y
004,0,5.4, 
004,1,3.2,N
004,2,.,Y
004,3,4.8,N
;
run;
data want;
 update sample_data(obs=0) sample_data;
 by pt;
 lag=lag(dose);
 output;
run;
emera86
Quartz | Level 8

Dear @Ksharp,

 

Thank you for your quick response. Although your code is working for this simple example, I don't know if it will fit my needs for the full data set on which some cells are expected to remain missing if samedose ne 'Y'. From what I see, you are not using any conditional structure in your code to force the LAG value to be used only when samedose='Y'. Or maybe you are including this feature and I'm not fully understanding your code. 

 

Could you please explain a little bit more what is the purpose of each line?

 

Thank you very much in advance! 

Ksharp
Super User

"some cells are expected to remain missing if samedose ne 'Y'. From what I see, you are not using any conditional structure in your code to force the LAG value to be used only when samedose='Y'. "

 

I don't understand what you mean. An example is best to explain question.

emera86
Quartz | Level 8

You're right, @Ksharp, I tried to simplify as much as possible the input sample and on the way I lost some of its features.

 

Let me introduce a new variable called "withdrawn" which turns to Y if the patient is withdrawn from the study before completing the 4 scheduled visits (0, 1, 2, 3). In that case, the dose value should remain missing for visits to which the patient did not attend. Imagine this modified sample data set on which patients 003 and 004 have an early termination:

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $ withdrawn $;
	datalines;
001,0,7.4, ,
001,1,.,Y, 
001,2,.,Y,
001,3,.,Y,
002,0,3.7, , 
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, , 
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, , 
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;

In this case when I run your code I obtain this result on which I have highlighted the unwanted results:

 

withdrawn.PNG

 

Could your code be modified to include this?

 

Thanks for your patience!

Ksharp
Super User

That is easy.

 

data sample_data;
	infile datalines delimiter=',';
	input pt $ visit dose samedose $ withdrawn $;
	datalines;
001,0,7.4, ,
001,1,.,Y, 
001,2,.,Y,
001,3,.,Y,
002,0,3.7, , 
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, , 
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, , 
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
data want;
 update sample_data(obs=0) sample_data;
 by pt;
 lag=lag(dose);
 if withdrawn='Y' then call missing(dose,samedose);
 output;
run;
Kurt_Bremser
Super User

Using lag() in a conditional branch is always dangerous, as lag() feeds its FIFO chain only when it is called, and it puts the current value into it, so you will at one point propagate a missing value if two or more appear in succession. Use a retained variable instead:

data want;
set sample_data;
by pt;
retain _dose;
if first.pt then _dose = .;
if dose ne .
then _dose = dose;
else dose = _dose;
drop _dose;
run;
emera86
Quartz | Level 8

Dear @Kurt_Bremser,

 

I thought about doing it with the retain function as you are suggesting, but now that I've tried it with LAG, I want to get to understand how this function works for future applications. 

 

Thank you very much for your quick answer!

ChrisNZ
Tourmaline | Level 20

As @Kurt_Bremser said, the lag() function only updates its value when called.

Since you don't always call it by using it inside a test

if samedose = 'Y' then dose = lag1(dose);

it misses some updates.

Using an extraneous variable as you did is the proper way to do what you want.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1581 views
  • 0 likes
  • 4 in conversation