Hi friend, I have a data set and I want to generate the baseline value. Can you point out why my code does not work? Thank you.
data I have:
Obs | randomization | visit | tac |
1 | 618001 | 2 | 8824 |
2 | 618001 | 3 | 9088 |
3 | 618001 | 4 | 9784 |
4 | 618002 | 2 | 7962 |
5 | 618002 | 3 | 8318 |
6 | 618002 | 4 | 8819 |
7 | 618003 | 2 | 10630 |
8 | 618003 | 3 | 8906 |
9 | 618003 | 4 | 9796 |
data I want:
Obs | randomization | visit | tac | base_tac |
1 | 618001 | 2 | 8824 | 8824 |
2 | 618001 | 3 | 9088 | 8824 |
3 | 618001 | 4 | 9784 | 8824 |
4 | 618002 | 2 | 7962 | 7962 |
5 | 618002 | 3 | 8318 | 7962 |
6 | 618002 | 4 | 8819 | 7962 |
7 | 618003 | 2 | 10630 | 10630 |
8 | 618003 | 3 | 8906 | 10630 |
9 | 618003 | 4 | 9796 | 10630 |
data tac;
input randomization visit tac;
datalines;
618001 2 8824
618001 3 9088
618001 4 9784
618002 2 7962
618002 3 8318
618002 4 8819
618003 2 10630
618003 3 8906
618003 4 9796
;
run;
data tac;
set tac;
if visit = 2 then tac_v2 = tac ;
run;
proc sort data = tac;
by randomization visit;
run;
data tac;
set tac;
by randomization;
retain base_tac ;
base_tac = max(base_tac ,tac_v2);
run;
outpute of the code;
Obs randomization visit tac tac_v2 base_tac123456789
618001 | 2 | 8824 | 8824 | 8824 |
618001 | 3 | 9088 | . | 8824 |
618001 | 4 | 9784 | . | 8824 |
618002 | 2 | 7962 | 7962 | 8824 |
618002 | 3 | 8318 | . | 8824 |
618002 | 4 | 8819 | . | 8824 |
618003 | 2 | 10630 | 10630 | 10630 |
618003 | 3 | 8906 | . | 10630 |
618003 | 4 | 9796 | . | 10630 |
See my comments below. The problem is the last step, where SAS does what you tell it to do. If you follow through the steps using the debugger that may help, but I also tried the illustrate the logic.
So your logic is wrong, even though your code is correct. Hope that makes sense.
SAS Output
Obs | randomization | visit | tac | tac_v2 |
---|---|---|---|---|
1 | 618001 | 2 | 8824 | 8824 |
2 | 618001 | 3 | 9088 | . max(base_tac, tac_v2) - > base_tac = 8824, tac_v2=. so the max is 8824 |
3 | 618001 | 4 | 9784 | . (same as above) |
4 | 618002 | 2 | 7962 | 7962 Note you have not reset any values here, so base_tac=8824 , tac_v2=7962 and max is 8824 like you've requested. |
5 | 618002 | 3 | 8318 | . |
6 | 618002 | 4 | 8819 | . |
7 | 618003 | 2 | 10630 | 10630 |
8 | 618003 | 3 | 8906 | . |
9 | 618003 | 4 | 9796 | . |
@superbibi wrote:
@Reeza@novinosrin, thank you both. Your code is good! But could you point out what is wrong with my code?
data tac;
input randomization visit tac;
datalines;
618001 2 8824
618001 3 9088
618001 4 9784
618002 2 7962
618002 3 8318
618002 4 8819
618003 2 10630
618003 3 8906
618003 4 9796
;
run;
data want;
set tac;
by randomization;
retain base_tac;
if first.randomization then base_tac=tac;
run;
Not sure why yours doesn't work, but you have a few things that may be causing it.
1. Don't use the same name in the DATA and SET statement. This makes it really hard to debug code (case in point) and you have to remember to always run your code from the top.
It's also more steps, you should sort and assign baseline.
data tac;
input randomization visit tac;
datalines;
618001 2 8824
618001 3 9088
618001 4 9784
618002 2 7962
618002 3 8318
618002 4 8819
618003 2 10630
618003 3 8906
618003 4 9796
;
run;
proc sort data = tac;
by randomization visit;
run;
data tac;
set tac;
by randomization;
retain tac_baseline;
if first.randomization then tac_baseline=tac; ;
run;
@Reeza@novinosrin, thank you both. Your code is good! But could you point out what is wrong with my code?
See my comments below. The problem is the last step, where SAS does what you tell it to do. If you follow through the steps using the debugger that may help, but I also tried the illustrate the logic.
So your logic is wrong, even though your code is correct. Hope that makes sense.
SAS Output
Obs | randomization | visit | tac | tac_v2 |
---|---|---|---|---|
1 | 618001 | 2 | 8824 | 8824 |
2 | 618001 | 3 | 9088 | . max(base_tac, tac_v2) - > base_tac = 8824, tac_v2=. so the max is 8824 |
3 | 618001 | 4 | 9784 | . (same as above) |
4 | 618002 | 2 | 7962 | 7962 Note you have not reset any values here, so base_tac=8824 , tac_v2=7962 and max is 8824 like you've requested. |
5 | 618002 | 3 | 8318 | . |
6 | 618002 | 4 | 8819 | . |
7 | 618003 | 2 | 10630 | 10630 |
8 | 618003 | 3 | 8906 | . |
9 | 618003 | 4 | 9796 | . |
@superbibi wrote:
@Reeza@novinosrin, thank you both. Your code is good! But could you point out what is wrong with my code?
@Reeza, Thank you for the explanation. I cannot understand the logic of by. I assumed that using by statement, the following procedure will repeat for each by group. So the retain statement can be repeated for each by group. It seems that SAS is not using this logic.
A by statement does the following:
and nothing more. It's up to the programmer to build the logic with first. and last.
Automatic reset of retained variables is not possible, as SAS would have no way to know when to reset if more than one level is present in the by.
@superbibi wrote:
@Reeza, Thank you for the explanation. I cannot understand the logic of by. I assumed that using by statement, the following procedure will repeat for each by group. So the retain statement can be repeated for each by group. It seems that SAS is not using this logic.
That's not a correct understanding of BY logic. @Kurt_Bremser response details it and the documentation chapter on it is pretty decent IMO.
You do have the right answers already posted. I wasn't sure if it would help to answer your question with a short version.
The RETAIN statement always retains TAC_BASELINE. But that's not what you need. You need to start the logic over again, each time RANDOMIZATION changes. As it stands, your program never starts over.
hello,
I propose this solution:
data tac;
input randomization visit tac;
datalines;
618001 2 8824
618001 3 9088
618001 4 9784
618002 2 7962
618002 3 8318
618002 4 8819
618003 2 10630
618003 3 8906
618003 4 9796
;
run;
proc sql;
create table test as select a.* ,base_tak
from tac as a
inner join
(select tac as base_tak ,randomization from
tac where visit=2) as b
on a.randomization=b.randomization
;
quit;
alternatively also try
data tac;
input randomization visit tac;
datalines;
618001 2 8824
618001 3 9088
618001 4 9784
618002 2 7962
618002 3 8318
618002 4 8819
618003 2 10630
618003 3 8906
618003 4 9796
;
run;
data want;
do until(last.randomization);
set tac;
by randomization;
if first.randomization then base=tac;
end;
do until(last.randomization);
set tac;
by randomization;
output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.