BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

I have a data set with two indicator variables b and c as follows.

data _;
	input a b c;
cards;
1 0 0
2 1 0
3 1 1
4 1 1
5 0 1
6 0 0
7 0 0
8 1 1
;
run;

So only the variable b changes when a is either 2 or 5, only the variable c changes when a is either 3 or 6, and both the variables b and c change when a is 8. For each of those observations, I am trying to attach one additional line right before each with the variables remaining unchanged—long story short, to make steps in PROC GPLOT. For example,

a b c

1 0 0

2 0 0

2 1 0

3 1 0

3 1 1

4 1 1

5 1 1

5 0 1

6 0 1

6 0 0

7 0 0

8 0 0

8 1 1

as there were changes in either b or c or both. To duplicate the observations, I tried the following code first.

data _;
	set _;
	if abs(b-lag(b))=1 or abs(c-lag(c))=1 then output;
	output;
	if abs(b-lag(b))=1 then b=1-b;
	if abs(c-lag(c))=1 then c=1-c;
run;

It seems the two OUTPUTs duplicate the intended observations, but the later two IF-THENs don't change the observations as they cannot affect the already output observations.

a b c

1 0 0

2 1 0

2 1 0

3 1 1

3 1 1

4 1 1

5 0 1

5 0 1

6 0 0

6 0 0

7 0 0

8 1 1

8 1 1

However, I cannot change their orders as the change also affects the OUTPUTs. Currently, I am using the following two steps.

data _;
	set _;
	if abs(b-lag(b))=1 or abs(c-lag(c))=1 then output;
	output;
run;
data _;
	set _;
	if abs(b-lag(b))=1 then b=1-b;
	if abs(c-lag(c))=1 then c=1-c;
run;

But is it impossible to do these two data steps together in just one data step? I welcome a totally new approach as well. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

I don't understand why c = 0 in the third line of the expected output:

4 1 1
5 1 1
5 1 0

In the input you have

4 1 1
5 0 1

so i would expect the following lines to be right, but maybe i misunderstood your requirement:

4	1	1
5	1	1
5	0	1

The following step could be what you need:

data want;
   set have;

   last_b = lag(b);
   last_c = lag(c);

   if _n_ > 1 and (last_b ^= b or last_c ^= c) then do;
      /* backup values and set b and c to their last values */
      bak_b = b;
      bak_c = c;
      b = last_b;
      c = last_c;
      output;

      /* restore values */
      b = bak_b;
      c = bak_c;
   end;

   output;

   drop bak_: last_:;
run;

When using lag-function avoid to call it multiple times with the same variable and avoid using it conditionally.

View solution in original post

5 REPLIES 5
andreas_lds
Jade | Level 19

I don't understand why c = 0 in the third line of the expected output:

4 1 1
5 1 1
5 1 0

In the input you have

4 1 1
5 0 1

so i would expect the following lines to be right, but maybe i misunderstood your requirement:

4	1	1
5	1	1
5	0	1

The following step could be what you need:

data want;
   set have;

   last_b = lag(b);
   last_c = lag(c);

   if _n_ > 1 and (last_b ^= b or last_c ^= c) then do;
      /* backup values and set b and c to their last values */
      bak_b = b;
      bak_c = c;
      b = last_b;
      c = last_c;
      output;

      /* restore values */
      b = bak_b;
      c = bak_c;
   end;

   output;

   drop bak_: last_:;
run;

When using lag-function avoid to call it multiple times with the same variable and avoid using it conditionally.

Junyong
Pyrite | Level 9

Many thanks for help, and I checked this works—I corrected my mistake yesterday, but forgot to say thank you for that. It is really interesting as this code is similar to my very first approach I did not include here. I failed with the following code but could not figure out the reason, which is why I excluded the code.

data want;
	set have;
	if _n_>1 and (b^=lag(b) or c^=lag(c)) then do;
		if b^=lag(b) then b=1-b;
		if c^=lag(c) then c=1-c;
		output;
		if b^=lag(b) then b=1-b;
		if c^=lag(c) then c=1-c;
	end;
	output;
run;

I though I coded exactly the same thing—duplicated the changes first (even screening by _N_=1 was the same), changed the changes, output, and then returned back. I got a totally unintended outcome from the code as follows.

a b c

1 0 0

2 0 1

2 1 0

3 1 0

3 0 1

4 1 1

5 1 1

5 1 0

6 0 1

6 1 1

7 0 0

8 0 0

8 0 1

What was the problem? This code even changed the original data.

FreelanceReinh
Jade | Level 19

@Junyong wrote:

I failed with the following code but could not figure out the reason, which is why I excluded the code.

data want;
	set have;
	if _n_>1 and (b^=lag(b) or c^=lag(c)) then do;
		if b^=lag(b) then b=1-b;
		if c^=lag(c) then c=1-c;
		output;
		if b^=lag(b) then b=1-b;
		if c^=lag(c) then c=1-c;
	end;
	output;
run;

I though I coded exactly the same thing (...)

What was the problem?


 

Hi @Junyong,

 

Remember that the LAG function in general does not return a value from the previous observation, but it returns a value from a FIFO (first in, first out) queue (see our discussion last month). Therefore it's rarely a good idea to use the LAG function in the THEN or ELSE branch of an IF-THEN/ELSE statement, i.e., to call it conditionally (as andreas_lds pointed out already). It is no problem per se, however, to use it in an IF condition. (See this older discussion if you have more time.)

 

Moreover, each occurrence of the LAG function in the DATA step corresponds to a separate queue so that it makes a difference whether you call the function once or twice even if the program logic seems to be "exactly the same" in both cases (see another striking example here). That's why people often save the result of a LAG function in a variable and use this subsequently (as andreas_lds did) rather than call LAG again (as you did).

 

More specifically, your first IF condition (if _n_>1 and ...) using the LAG function is correct, but since the LAG functions in the DO-END block create separate queues and they are called conditionally, all four IF conditions in this block are met when it is executed for the first time: The four fresh queues return missing values, whereas b and c are non-missing. This is where the incorrect results begin to arise.

Junyong
Pyrite | Level 9

Thanks for your considerate details. I intuited that LAG inside IF unconditionally refers to the observations ahead, but just realized that it refers to the observations ahead conditionally on the IF outside. The four IFs inside the IF outside did all the jobs conditionally inside the subset of the observations created by the IF outside—Observation 3 based on Observation 2, Observation 5 based on Observation 3, and so forth. Understood.

 

Here I attach my code after your post.

data _;
	set _;
	if b^=lag(b) and c^=lag(c) then do;
		b=1-b;
		c=1-c;
		if _n_^=1 then output;
		b=1-b;
		c=1-c;
	end;
	if b^=lag(b) and c=lag(c) then do;
		b=1-b;
		if _n_^=1 then output;
		b=1-b;
	end;
	if b=lag(b) and c^=lag(c) then do;
		c=1-c;
		if _n_^=1 then output;
		c=1-c;
	end;
	output;
run;

Thanks.

Shmuel
Garnet | Level 18

Using LAG under IF may be unpredictable. 

In order not to change input use temporary output in between steps.

See changes and notes to your codes:

 

data _new;
	set _;
       retain prev_b prev_c;
       drop prev_b prev_c;

	if abs(b-prev_b) = 1 or abs(c-prev_c) = 1 then output;
	else output;  /* else added to prevent duplicates */
         /*** useless as reading next obs will override this assignment
	if abs(b-lag(b))=1 then b=1-b;
	if abs(c-lag(c))=1 then c=1-c;
        *******/
run;

Try next code:

data _tmp;
  set _;
        retain prev_b prev_c;
        drop prev_b prev_c;

	if abs(b-prev_b)=1 or abs(c-prev_c)=1 then do;
	   if abs(b-prev_b) =1 then b = 1-b;
	   if abs(c-prev_c) =1 then c = 1-c;
        end;
	output;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1163 views
  • 0 likes
  • 4 in conversation