BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

Suppose that there are 5 groups and each group has 5 observations that share 1 identical random number. Then,

data panel;
call streaminit(1);
do a=1 to 5;
afix=rand("normal");
do b=1 to 5;
output;
end;
end;
run;

The example generates 25 observations with a,b∈{1,…,5} and 5 observations in each a share an identical afix.

01.png

I need to twist this code a bit by looping b first and then a, so tried the following instead.

data panel;
call streaminit(1);
do b=1 to 5;
do a=1 to 5;
if b=1 then afix=rand("normal");
else afix=lag5(afix);
output;
end;
end;
run;

I thought that LAG5 will pull the corresponding observation and locate correctly, but did not work at all.

02.png

As there are other things in the data step, I am just considering this data step and not PROC SORT. Is there any correct way to do this? Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Junyong,

 

Does this produce the result that you want?

data panel(drop=t);
call streaminit(1);
do b=1 to 5;
  do a=1 to 5;
    if b=1 then afix=rand("normal");
    else afix=t;
    output;
    t=lag4(afix);
  end;
end;
run;

 

Edit: Here is an equivalent, but slightly shorter solution:

data panel;
call streaminit(1);
do b=1 to 5;
  do a=1 to 5;
    afix=ifn(b=1,rand("normal"),lag4(afix));
    output;
  end;
end;
run;

View solution in original post

6 REPLIES 6
FreelanceReinh
Jade | Level 19

Hi @Junyong,

 

Does this produce the result that you want?

data panel(drop=t);
call streaminit(1);
do b=1 to 5;
  do a=1 to 5;
    if b=1 then afix=rand("normal");
    else afix=t;
    output;
    t=lag4(afix);
  end;
end;
run;

 

Edit: Here is an equivalent, but slightly shorter solution:

data panel;
call streaminit(1);
do b=1 to 5;
  do a=1 to 5;
    afix=ifn(b=1,rand("normal"),lag4(afix));
    output;
  end;
end;
run;
Junyong
Pyrite | Level 9
This works. Much appreciate, but can I ask why—what is my code's problem?
FreelanceReinh
Jade | Level 19

@Junyong wrote:
This works. Much appreciate, but can I ask why—what is my code's problem?

The problem with your code is that

  1. The LAGn function in general does not return a value from some previous observation (n obs. before the current obs.), but it returns values from a FIFO (first in, first out) queue (see documentation).
  2. You call the LAG function conditionally in an ELSE statement, but for your application the queue should operate in every iteration of the inner DO loop.

The table below shows the contents of variables b, a and afix and the internal queue (consisting of five positions, which I named q1-q5 just for demonstration) associated with the LAG5 function during the first couple of iterations of the DO loops.

 

b a afix q5 q4 q3 q2 q1
1 1 r1 . . . . .
1 2 r2 . . . . .
1 3 r3 . . . . .
1 4 r4 . . . . .
1 5 r5 . . . . .
2 1 . r5 . . . .
2 2 . . r5 . . .
2 3 . . . r5 . .
2 4 . . . . r5 .
2 5 . . . . . r5
3 1 r5 . . . . .
3 2 . r5 . . . .
3 3 . . r5 . . .

and so on.

 

In the first five iterations of the inner DO loop the ELSE branch is not executed because the IF condition (b=1) is met. Hence, the LAG5 queue remains in its default state: all five positions are empty (or rather: contain missing values). Variable afix is assigned one of five random values from the RAND function (which I denoted as r1, r2, ... for simplicity).

 

In the next iteration, i.e. when b=2 and a=1, the LAG5 function is executed for the first time, which (always) means:

  1. All elements of the queue are shifted by one position towards the top of the queue (in the above table: shifted to the right, i.e. from q5 to q4, from q4 to q3, from q3 to q2, from q2 to q1 and the element in q1 leaves the queue).
  2. The current value of afix (the argument of the LAG5 function) is pushed into the queue (at the "bottom", here denoted with q5, where an empty position has occurred due to the shift).
  3. The LAG5 function returns the value from the "top" of the queue (here denoted with q1), which has been "pushed out". This value is stored in variable afix by the assignment statement.

Thus, the last random number, r5, is pushed into the queue while the missing value returned from the LAG5 function (from q1) is assigned to variable afix. In the next four iterations (b=2, a=2, 3, 4, 5) the LAG5 function is executed, too. As a consequence, r5 moves through the queue while missing values from afix (received from the queue) are pushed "again" into the queue and the LAG5 function returns, one by one, the missing values the queue contained from its initialization. When b=3 and a=1, r5 is finally pushed out of the queue and assigned to afix. The missing value previously in afix is pushed into the queue. The next iteration (b=3, a=2) is similar to that with b=2 and a=1: The value r5 of afix is pushed into the queue and afix receives the missing value previously stored in position q1 of the queue. Similarly, the iteration "b=3, a=3" repeats what happened in the iteration "b=2, a=2" and so on.

 

This explains the unsatisfactory result in your output dataset.

 

In my first solution I call the LAG4 (!) function unconditionally and assign its value to a temporary variable t. This occurs after the OUTPUT statement and the value of t is used only in the next iteration of the DO loop where it is conditionally copied into variable afix. Below is the corresponding table showing the contents of b, a, afix, t and the four positions in the LAG4 queue at the end of each iteration of the inner DO loop:

 

b a afix t q4 q3 q2 q1
1 1 r1 . r1 . . .
1 2 r2 . r2 r1 . .
1 3 r3 . r3 r2 r1 .
1 4 r4 . r4 r3 r2 r1
1 5 r5 r1 r5 r4 r3 r2
2 1 r1 r2 r1 r5 r4 r3
2 2 r2 r3 r2 r1 r5 r4
2 3 r3 r4 r3 r2 r1 r5
2 4 r4 r5 r4 r3 r2 r1
2 5 r5 r1 r5 r4 r3 r2
3 1 r1 r2 r1 r5 r4 r3

and so on.

 

Thanks to the unconditional execution of the LAG4 function, the queue starts operating in the very first iteration of the DO loops. Until the fourth iteration it has received the random numbers r1-r4 from variable afix while returning its initial missing values to variable t. In the fifth iteration the LAG4 function for the first time returns a non-missing value: the random number r1. This is kept ready in variable t for the assignment to afix in the sixth iteration (b=2, a=1). Still in this sixth iteration r1 is pushed into the queue (from afix to q4) and r2 is pushed out into t. You see how the process goes on and yields the desired result in dataset PANEL.

 

The shorter alternative solution, which I added earlier today, makes use of the fact that in the IFN function the LAG4 queue operates in each call to IFN regardless (!) of whether the "IF condition" (here: b=1) is met. This is an important difference to the traditional IF-THEN/ELSE statement. Below is the corresponding table:

 

b a afix q4 q3 q2 q1
1 1 r1 . . . .
1 2 r2 r1 . . .
1 3 r3 r2 r1 . .
1 4 r4 r3 r2 r1 .
1 5 r5 r4 r3 r2 r1
2 1 r1 r5 r4 r3 r2
2 2 r2 r1 r5 r4 r3
2 3 r3 r2 r1 r5 r4

and so on.

 

In the first iteration variable afix is assigned random number r1 from the RAND function (via IFN, of course) because the "IF condition" b=1 is met. The LAG4 queue operates nevertheless. It receives a missing (!) value from variable afix because the assignment statement (leading to afix=r1) has not finished yet! The missing value pushed out from q1 is not stored anywhere because the "ELSE condition" is not met.

 

In the second iteration (b=1, a=2) r1 from afix is pushed into the queue (position q4), three missing values are shifted within the queue and one missing value (from q1) drops out, unused. Variable afix is assigned the second random number, r2, coming from RAND.

 

In the third iteration (b=1, a=3) r2 from afix is pushed into the queue (position q4), r1 is shifted to q3 as are two missing values (to q2 and q1, resp.). Again, a missing value (from q1) drops out, unused, and eventually afix receives value r3, created by RAND, in the assignment statement.

 

...

 

In the sixth iteration (b=2, a=1) the "IF condition" is not met so that the return value of the LAG4 function, i.e., random number r1 pushed out from position q1, is assigned to variable afix after the former value of afix, r5, has been fed into the queue, shifting r4, r3 and r2 "to the right" (in the table). Similar actions occur in the subsequent iterations because condition b=1 is never met again.

 

You see how the iterations continue and build the desired output dataset without the need of a temporary variable t.

Junyong
Pyrite | Level 9

Appreciate again these rigorous details. It seems (1) LAG and hence the queues in my code start after the ELSE (so the code generated no queue under the IF) and (2) I misunderstood the way LAG behaves—it does not pull the past numbers but push the numbers in the queues—LAG4 in IFN in your code in particular seems very interesting as I thought it must be LAG5 instead. This helps a lot.

Astounding
PROC Star

Despite the elegant solution from @FreelanceReinh , I might have chosen a simpler approach.  You can eliminate LAG (and its complications entirely) by adding a loop to the program.  Here is an untested version:

 

data panel;
   call streaminit(1);
   array fixlist {5} _temporary_;
   do a=1 to 5;
      fixlist{a} = rand("normal");
   end;
   do b=1 to 5;
      do a=1 to 5;
         afix = fixlist{a};
         output;
      end;
   end;
run;

 

 

Junyong
Pyrite | Level 9
This detours LAG using ARRAY and _TEMPORARY_. I frequently use them but have never thought in this way before. Thanks for your code.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 906 views
  • 1 like
  • 3 in conversation