- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I have a data set containing the dose that each patient has received per visit. When the value has changed from previous the visit its value is included. If not, it is supposed to be the same and is not included. To make some calculations I would like to create a code that, when the dose is mantained (samedose = 'Y'), the previous dose is assigned to the actual register.
Let me use a small example to explain what I mean. Starting from this data set:
data sample_data;
infile datalines delimiter=',';
input pt $ visit dose samedose $;
datalines;
001,0,7.4,
001,1,.,Y
001,2,.,Y
001,3,.,Y
002,0,3.7,
002,1,2.3,N
002,2,.,Y
002,3,.,Y
003,0,5.4,
003,1,.,Y
003,2,2.7,N
003,3,.,Y
004,0,5.4,
004,1,3.2,N
004,2,.,Y
004,3,4.8,N
;
run;
I would like to obtain the following completed data set:
For this purpose I thought that the LAG function could be a good option. However, it is behaving unexpectedly:
- It only seems to work if I assign the value to an auxiliary variable (lagdose) before. If I try to do it direcly it doesn't work.
- It seems to be calculating the LAG values at the beginning of the execution, so values that originally were missing, even if they are later assinged to a non-missing one during execution, remain missing. I thought this function worked per register, so the LAG value is calculated AFTER the previous one is assinged, but apparently it is not. Is this right?
This code is not working at all:
data sample_lag_output;
set sample_data;
if samedose = 'Y' then dose = lag1(dose);
run;
This code is only assigning properly only the values whose previous value was originally non-missing:
data sample_lag_output;
set sample_data;
lagdose = lag1(dose);
if samedose = 'Y' then dose = lagdose;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is easy.
data sample_data;
infile datalines delimiter=',';
input pt $ visit dose samedose $ withdrawn $;
datalines;
001,0,7.4, ,
001,1,.,Y,
001,2,.,Y,
001,3,.,Y,
002,0,3.7, ,
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, ,
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, ,
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
data want;
update sample_data(obs=0) sample_data;
by pt;
lag=lag(dose);
if withdrawn='Y' then call missing(dose,samedose);
output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data sample_data;
infile datalines delimiter=',';
input pt $ visit dose samedose $;
datalines;
001,0,7.4,
001,1,.,Y
001,2,.,Y
001,3,.,Y
002,0,3.7,
002,1,2.3,N
002,2,.,Y
002,3,.,Y
003,0,5.4,
003,1,.,Y
003,2,2.7,N
003,3,.,Y
004,0,5.4,
004,1,3.2,N
004,2,.,Y
004,3,4.8,N
;
run;
data want;
update sample_data(obs=0) sample_data;
by pt;
lag=lag(dose);
output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear @Ksharp,
Thank you for your quick response. Although your code is working for this simple example, I don't know if it will fit my needs for the full data set on which some cells are expected to remain missing if samedose ne 'Y'. From what I see, you are not using any conditional structure in your code to force the LAG value to be used only when samedose='Y'. Or maybe you are including this feature and I'm not fully understanding your code.
Could you please explain a little bit more what is the purpose of each line?
Thank you very much in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"some cells are expected to remain missing if samedose ne 'Y'. From what I see, you are not using any conditional structure in your code to force the LAG value to be used only when samedose='Y'. "
I don't understand what you mean. An example is best to explain question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're right, @Ksharp, I tried to simplify as much as possible the input sample and on the way I lost some of its features.
Let me introduce a new variable called "withdrawn" which turns to Y if the patient is withdrawn from the study before completing the 4 scheduled visits (0, 1, 2, 3). In that case, the dose value should remain missing for visits to which the patient did not attend. Imagine this modified sample data set on which patients 003 and 004 have an early termination:
data sample_data;
infile datalines delimiter=',';
input pt $ visit dose samedose $ withdrawn $;
datalines;
001,0,7.4, ,
001,1,.,Y,
001,2,.,Y,
001,3,.,Y,
002,0,3.7, ,
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, ,
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, ,
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
In this case when I run your code I obtain this result on which I have highlighted the unwanted results:
Could your code be modified to include this?
Thanks for your patience!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is easy.
data sample_data;
infile datalines delimiter=',';
input pt $ visit dose samedose $ withdrawn $;
datalines;
001,0,7.4, ,
001,1,.,Y,
001,2,.,Y,
001,3,.,Y,
002,0,3.7, ,
002,1,2.3,N,
002,2,.,Y,
002,3,.,Y,
003,0,5.4, ,
003,1,.,Y,
003,2,2.7,N,
003,3,., ,Y
004,0,5.4, ,
004,1,3.2,N,
004,2,., ,Y
004,3,., ,Y
;
run;
data want;
update sample_data(obs=0) sample_data;
by pt;
lag=lag(dose);
if withdrawn='Y' then call missing(dose,samedose);
output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using lag() in a conditional branch is always dangerous, as lag() feeds its FIFO chain only when it is called, and it puts the current value into it, so you will at one point propagate a missing value if two or more appear in succession. Use a retained variable instead:
data want;
set sample_data;
by pt;
retain _dose;
if first.pt then _dose = .;
if dose ne .
then _dose = dose;
else dose = _dose;
drop _dose;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear @Kurt_Bremser,
I thought about doing it with the retain function as you are suggesting, but now that I've tried it with LAG, I want to get to understand how this function works for future applications.
Thank you very much for your quick answer!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As @Kurt_Bremser said, the lag() function only updates its value when called.
Since you don't always call it by using it inside a test
if samedose = 'Y' then dose = lag1(dose);
it misses some updates.
Using an extraneous variable as you did is the proper way to do what you want.