BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
UcheOkoro
Lapis Lazuli | Level 10

Hello,

Please, I need help creating a variable from comparing consecutive values. Baseline is the baseline lab values, Month_01-Month-05 is for follow up visit lab values and PCHNG_01-PCHNG_05 is the percent change from baseline.

The first level of the variable is where baseline is =>400 and two consecutive follow-up visits are =>350;

 the second level is where the baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline;

the third level is where the baseline is <200 and two consecutive follow-ups were also within 25% of the baseline;

the 4th level did not meet any of these criteria.

Thank you!

 

data have;
input ID Baseline $ Month_01-Month_05 PCHNG_01-PCHNG_05;
datalines;
1 250 .   200 100 100 150 . -20 -60 -60 -40
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6 ; run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Providing sample data AND the expected result really helped to clarify things. See if below also works for your real data.

data have;
  input ID Baseline $ Month_01-Month_05 PCHNG_01-PCHNG_05 new_expected;
  datalines;
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25 1
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7 1
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6 2
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8 3
1 250 . 200 100 100 150 . -20 -60 -60 -40 4
;

data want(drop=_:);
  set have;

  array aMth {*} month_:;
  array aPct {*} pchng_:;

  /*baseline is =>400 and two consecutive follow-up visits are =>350 */
  if baseline>=400 then
    do;
      do _i=1 to dim(aMth)-1;
        if aMth[_i]>=350 and aMth[_i+1]>350 then 
          do;
            new_derived=1;
            leave;
          end;
      end;
    end;

  /* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
  if missing(new_derived) and baseline>=300 and baseline<400 then
    do;
      do _i=1 to dim(aMth);
        if aMth[_i]>=300 and aMth[_i+1]>300 then 
          do;
            if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then 
              do;
                new_derived=2;
                leave;
              end;
          end;
      end;
    end;

  /* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
  if missing(new_derived) and baseline<200 then
    do;
      do _i=1 to dim(aPct)-1;
        if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then 
          do;
            new_derived=3;
            leave;
          end;
      end;
    end;

  /*4th level did not meet any of these criteria */
  if missing(new_derived) then
    do;
      new_derived=4;
    end;

run;

proc print data=want;
run;

Patrick_0-1694854366812.png

 

As a variation to above code here a version that uses a GOTO. I consider this one of the rather rare cases where using GOTO is still legitimate because it avoids execution of logical steps once a case has been identified. 

data want(drop=_:);
  set have;

  array aMth {*} month_:;
  array aPct {*} pchng_:;

  /*baseline is =>400 and two consecutive follow-up visits are =>350 */
  if baseline>=400 then
    do;
      do _i=1 to dim(aMth)-1;
        if aMth[_i]>=350 and aMth[_i+1]>350 then 
          do;
            new_derived=1;
            /* case identified, do not check further */
            goto done;
          end;
      end;
    end;

  /* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
  if baseline>=300 and baseline<400 then
    do;
      do _i=1 to dim(aMth);
        if aMth[_i]>=300 and aMth[_i+1]>300 then 
          do;
            if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then 
              do;
                new_derived=2;
                /* case identified, do not check further */
                goto done;
              end;
          end;
      end;
    end;

  /* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
  if baseline<200 then
    do;
      do _i=1 to dim(aPct)-1;
        if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then 
          do;
            new_derived=3;
            /* case identified, do not check further */
            goto done;
          end;
      end;
    end;

  /*4th level did not meet any of these criteria */
  if missing(new_derived) then
    do;
      new_derived=4;
    end;

  /* target of goto statement */
  done:;

run;

View solution in original post

7 REPLIES 7
Reeza
Super User
Please show the expected output. What should the new variable show for each record in your data set?
UcheOkoro
Lapis Lazuli | Level 10

Thank you for your response. It would level1=1, level2=2 etc.

 

 

PaigeMiller
Diamond | Level 26

@UcheOkoro wrote:

It would level1=1, level2=2 etc.


 

I still do not understand what you want. And by the way, this is not what @Reeza asked for, which was to SHOW US the expected output and what the value of the new variable would be on each row of the data set.

--
Paige Miller
UcheOkoro
Lapis Lazuli | Level 10

My apologies for the confusion. The following is a table of the output I expect to see. 

 

ID

Baseline

Month_01

Month_02

Month_03

Month_04

Month_05

PCHNG_01

PCHNG_02

PCHNG_03

PCHNG_04

PCHNG_05

New_variable

2

400

300

350

380

300

300

-25

-12.5

-5

-25

-25

1

3

600

450

500

450

500

500

-33.3

-16.7

-33.3

-16.7

-16.7

1

5

350

320

330

200

200

250

-8.6

-5.7

-42.9

-42.9

-28.6

2

4

160

150

140

150

140

130

-6.3

-12.5

-5.3

-12.5

-18.8

3

1

250

.

200

100

100

150

.

-20

-60

-60

-40

4

Patrick
Opal | Level 21

Providing sample data AND the expected result really helped to clarify things. See if below also works for your real data.

data have;
  input ID Baseline $ Month_01-Month_05 PCHNG_01-PCHNG_05 new_expected;
  datalines;
2 400 300 350 380 300 300 -25 -12.5 -5 -25 -25 1
3 600 450 500 450 500 500 -33.3 -16.7 -33.3 -16.7 -16.7 1
5 350 320 330 200 200 250 -8.6 -5.7 -42.9 -42.9 -28.6 2
4 160 150 140 150 140 130 -6.3 -12.5 -5.3 -12.5 -18.8 3
1 250 . 200 100 100 150 . -20 -60 -60 -40 4
;

data want(drop=_:);
  set have;

  array aMth {*} month_:;
  array aPct {*} pchng_:;

  /*baseline is =>400 and two consecutive follow-up visits are =>350 */
  if baseline>=400 then
    do;
      do _i=1 to dim(aMth)-1;
        if aMth[_i]>=350 and aMth[_i+1]>350 then 
          do;
            new_derived=1;
            leave;
          end;
      end;
    end;

  /* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
  if missing(new_derived) and baseline>=300 and baseline<400 then
    do;
      do _i=1 to dim(aMth);
        if aMth[_i]>=300 and aMth[_i+1]>300 then 
          do;
            if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then 
              do;
                new_derived=2;
                leave;
              end;
          end;
      end;
    end;

  /* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
  if missing(new_derived) and baseline<200 then
    do;
      do _i=1 to dim(aPct)-1;
        if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then 
          do;
            new_derived=3;
            leave;
          end;
      end;
    end;

  /*4th level did not meet any of these criteria */
  if missing(new_derived) then
    do;
      new_derived=4;
    end;

run;

proc print data=want;
run;

Patrick_0-1694854366812.png

 

As a variation to above code here a version that uses a GOTO. I consider this one of the rather rare cases where using GOTO is still legitimate because it avoids execution of logical steps once a case has been identified. 

data want(drop=_:);
  set have;

  array aMth {*} month_:;
  array aPct {*} pchng_:;

  /*baseline is =>400 and two consecutive follow-up visits are =>350 */
  if baseline>=400 then
    do;
      do _i=1 to dim(aMth)-1;
        if aMth[_i]>=350 and aMth[_i+1]>350 then 
          do;
            new_derived=1;
            /* case identified, do not check further */
            goto done;
          end;
      end;
    end;

  /* baseline is =>300 but <400 and two consecutive follow-up visits are =>300 and were also within 25% of the baseline */
  if baseline>=300 and baseline<400 then
    do;
      do _i=1 to dim(aMth);
        if aMth[_i]>=300 and aMth[_i+1]>300 then 
          do;
            if abs(aPct[_i])<=25 and abs(aPct[_i+1])<=25 then 
              do;
                new_derived=2;
                /* case identified, do not check further */
                goto done;
              end;
          end;
      end;
    end;

  /* baseline is <200 and two consecutive follow-ups were also within 25% of the baseline */
  if baseline<200 then
    do;
      do _i=1 to dim(aPct)-1;
        if 0<=abs(aPct[_i])<=25 and 0<=abs(aPct[_i+1])<=25 then 
          do;
            new_derived=3;
            /* case identified, do not check further */
            goto done;
          end;
      end;
    end;

  /*4th level did not meet any of these criteria */
  if missing(new_derived) then
    do;
      new_derived=4;
    end;

  /* target of goto statement */
  done:;

run;
UcheOkoro
Lapis Lazuli | Level 10

Thank you so much, Patrick for your help!

mkeintz
PROC Star

This need to compare successive values in arrays is a good situation to use the UNTIL condition.

 

Edit: Let me restate that:  the need to stop a loop once a condition has been met is ready-made for the UNTIL clause.  And if the until condition is related to the new variable in this case, you can avoid a good deal of extra code.

 

 

data have;
  input ID $ Baseline  Month_01-Month_05 PCHNG_01-PCHNG_05 new_expected;
  datalines;
2 400 300 350 380 300 300 -25    -12.5   -5   -25   -25    1
3 600 450 500 450 500 500 -33.3  -16.7  -33.3 -16.7 -16.7  1
5 350 320 330 200 200 250  -8.6   -5.7  -42.9 -42.9 -28.6  2
4 160 150 140 150 140 130  -6.3  -12.5   -5.3 -12.5 -18.8  3
1 250 .   200 100 100 150    .   -20    -60   -60   -40    4
;

data want;
  set have;
  array f {*} month_01-month_05 ;
  array p {*} pchng_01-pchng_05 ;

  new=4;
  if baseline >= 400    then do i=2 to dim(f) until (new=1);
    if f{i}>=350 and f{i-1}>=350 then new=1;
  end;
  else if baseline>=300 then do i=2 to dim(f) until (new=2);
    if n(f{i},f{i-1})=2 and  min(f{i},f{i-1})>=300 and   max(abs(p{i}),abs(p{i-1})) <=25 then new=2;
  end;
  else if baseline<200 then do i=2 to dim(f) until (new=3);
    if n(p{i},p{i-1})=2 and max(abs(p{i}),abs(p{i-1}))<=25 then new=3;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1458 views
  • 3 likes
  • 5 in conversation