Hello,
Please, I need help creating a variable from comparing consecutive values. Baseline is the baseline lab values, Month_01-Month-05 is for follow up visit lab values and PCHNG_01-PCHNG_05 is the percent change from baseline.
The first level of the variable is where baseline is =>400 and two consecutive follow-up visits are =>350;
the second level is where the baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline;
the third level is where the baseline is <200 and two consecutive follow-ups were also within 25% of the baseline;
the 4th level did not meet any of these criteria.
Thank you!
data have;
input ID Baseline $ Month_01-Month_05 PCHNG_01-PCHNG_05;
datalines;
1 250 . 200 100 100 150 . -20 -60 -60 -40
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6
;
run;
Providing sample data AND the expected result really helped to clarify things. See if below also works for your real data.
data have;
input ID Baseline $ Month_01-Month_05 PCHNG_01-PCHNG_05 new_expected;
datalines;
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25 1
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7 1
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6 2
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8 3
1 250 . 200 100 100 150 . -20 -60 -60 -40 4
;
data want(drop=_:);
set have;
array aMth {*} month_:;
array aPct {*} pchng_:;
/*baseline is =>400 and two consecutive follow-up visits are =>350 */
if baseline>=400 then
do;
do _i=1 to dim(aMth)-1;
if aMth[_i]>=350 and aMth[_i+1]>350 then
do;
new_derived=1;
leave;
end;
end;
end;
/* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
if missing(new_derived) and baseline>=300 and baseline<400 then
do;
do _i=1 to dim(aMth);
if aMth[_i]>=300 and aMth[_i+1]>300 then
do;
if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then
do;
new_derived=2;
leave;
end;
end;
end;
end;
/* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
if missing(new_derived) and baseline<200 then
do;
do _i=1 to dim(aPct)-1;
if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then
do;
new_derived=3;
leave;
end;
end;
end;
/*4th level did not meet any of these criteria */
if missing(new_derived) then
do;
new_derived=4;
end;
run;
proc print data=want;
run;
As a variation to above code here a version that uses a GOTO. I consider this one of the rather rare cases where using GOTO is still legitimate because it avoids execution of logical steps once a case has been identified.
data want(drop=_:);
set have;
array aMth {*} month_:;
array aPct {*} pchng_:;
/*baseline is =>400 and two consecutive follow-up visits are =>350 */
if baseline>=400 then
do;
do _i=1 to dim(aMth)-1;
if aMth[_i]>=350 and aMth[_i+1]>350 then
do;
new_derived=1;
/* case identified, do not check further */
goto done;
end;
end;
end;
/* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
if baseline>=300 and baseline<400 then
do;
do _i=1 to dim(aMth);
if aMth[_i]>=300 and aMth[_i+1]>300 then
do;
if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then
do;
new_derived=2;
/* case identified, do not check further */
goto done;
end;
end;
end;
end;
/* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
if baseline<200 then
do;
do _i=1 to dim(aPct)-1;
if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then
do;
new_derived=3;
/* case identified, do not check further */
goto done;
end;
end;
end;
/*4th level did not meet any of these criteria */
if missing(new_derived) then
do;
new_derived=4;
end;
/* target of goto statement */
done:;
run;
Thank you for your response. It would level1=1, level2=2 etc.
@UcheOkoro wrote:
It would level1=1, level2=2 etc.
I still do not understand what you want. And by the way, this is not what @Reeza asked for, which was to SHOW US the expected output and what the value of the new variable would be on each row of the data set.
My apologies for the confusion. The following is a table of the output I expect to see.
ID |
Baseline |
Month_01 |
Month_02 |
Month_03 |
Month_04 |
Month_05 |
PCHNG_01 |
PCHNG_02 |
PCHNG_03 |
PCHNG_04 |
PCHNG_05 |
New_variable |
2 |
400 |
300 |
350 |
380 |
300 |
300 |
-25 |
-12.5 |
-5 |
-25 |
-25 |
1 |
3 |
600 |
450 |
500 |
450 |
500 |
500 |
-33.3 |
-16.7 |
-33.3 |
-16.7 |
-16.7 |
1 |
5 |
350 |
320 |
330 |
200 |
200 |
250 |
-8.6 |
-5.7 |
-42.9 |
-42.9 |
-28.6 |
2 |
4 |
160 |
150 |
140 |
150 |
140 |
130 |
-6.3 |
-12.5 |
-5.3 |
-12.5 |
-18.8 |
3 |
1 |
250 |
. |
200 |
100 |
100 |
150 |
. |
-20 |
-60 |
-60 |
-40 |
4 |
Providing sample data AND the expected result really helped to clarify things. See if below also works for your real data.
data have;
input ID Baseline $ Month_01-Month_05 PCHNG_01-PCHNG_05 new_expected;
datalines;
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25 1
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7 1
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6 2
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8 3
1 250 . 200 100 100 150 . -20 -60 -60 -40 4
;
data want(drop=_:);
set have;
array aMth {*} month_:;
array aPct {*} pchng_:;
/*baseline is =>400 and two consecutive follow-up visits are =>350 */
if baseline>=400 then
do;
do _i=1 to dim(aMth)-1;
if aMth[_i]>=350 and aMth[_i+1]>350 then
do;
new_derived=1;
leave;
end;
end;
end;
/* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
if missing(new_derived) and baseline>=300 and baseline<400 then
do;
do _i=1 to dim(aMth);
if aMth[_i]>=300 and aMth[_i+1]>300 then
do;
if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then
do;
new_derived=2;
leave;
end;
end;
end;
end;
/* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
if missing(new_derived) and baseline<200 then
do;
do _i=1 to dim(aPct)-1;
if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then
do;
new_derived=3;
leave;
end;
end;
end;
/*4th level did not meet any of these criteria */
if missing(new_derived) then
do;
new_derived=4;
end;
run;
proc print data=want;
run;
As a variation to above code here a version that uses a GOTO. I consider this one of the rather rare cases where using GOTO is still legitimate because it avoids execution of logical steps once a case has been identified.
data want(drop=_:);
set have;
array aMth {*} month_:;
array aPct {*} pchng_:;
/*baseline is =>400 and two consecutive follow-up visits are =>350 */
if baseline>=400 then
do;
do _i=1 to dim(aMth)-1;
if aMth[_i]>=350 and aMth[_i+1]>350 then
do;
new_derived=1;
/* case identified, do not check further */
goto done;
end;
end;
end;
/* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
if baseline>=300 and baseline<400 then
do;
do _i=1 to dim(aMth);
if aMth[_i]>=300 and aMth[_i+1]>300 then
do;
if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then
do;
new_derived=2;
/* case identified, do not check further */
goto done;
end;
end;
end;
end;
/* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
if baseline<200 then
do;
do _i=1 to dim(aPct)-1;
if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then
do;
new_derived=3;
/* case identified, do not check further */
goto done;
end;
end;
end;
/*4th level did not meet any of these criteria */
if missing(new_derived) then
do;
new_derived=4;
end;
/* target of goto statement */
done:;
run;
Thank you so much, Patrick for your help!
This need to compare successive values in arrays is a good situation to use the UNTIL condition.
Edit: Let me restate that: the need to stop a loop once a condition has been met is ready-made for the UNTIL clause. And if the until condition is related to the new variable in this case, you can avoid a good deal of extra code.
data have;
input ID $ Baseline Month_01-Month_05 PCHNG_01-PCHNG_05 new_expected;
datalines;
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25 1
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7 1
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6 2
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8 3
1 250 . 200 100 100 150 . -20 -60 -60 -40 4
;
data want;
set have;
array f {*} month_01-month_05 ;
array p {*} pchng_01-pchng_05 ;
new=4;
if baseline >= 400 then do i=2 to dim(f) until (new=1);
if f{i}>=350 and f{i-1}>=350 then new=1;
end;
else if baseline>=300 then do i=2 to dim(f) until (new=2);
if n(f{i},f{i-1})=2 and min(f{i},f{i-1})>=300 and max(abs(p{i}),abs(p{i-1})) <=25 then new=2;
end;
else if baseline<200 then do i=2 to dim(f) until (new=3);
if n(p{i},p{i-1})=2 and max(abs(p{i}),abs(p{i-1}))<=25 then new=3;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.