DATA Step, Macro, Functions and more

data step execution

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

data step execution

Hi everybody

there is two code which are acting differently. can anybody explain why?

Hint: Lag function puls from queue!!

According to me they must procedure the same result!!!

data test;

INFILE datalines DLM=',' DSD;
input a b c ;
datalines;
4272451,17878,17878
4272451,17878,17878
4272451,17887,17887
4272454,17878,17878
4272454,17881,17881
4272454,17893,17893
4272455,17878,17878
4272455,17878,18200
run;

DATA TEST1;

RETAIN   F ( 1) ;

laga = lag(a); lagb=lag(b);

SET TEST;
IF A^=laga OR laga =.  THEN do;    f=1;end;  ELSE IF  A=laga AND b>lagb THEN do; f=f+1  ; end;
RUN;


proc print data=test1;

run;

DATA TEST2;
RETAIN   F (  1) ;
SET TEST;
IF A^=LAG(A) OR LAG(A)=.  THEN do;    f=1;end;  ELSE IF  A=LAG(A) AND b>LAG(B) THEN do; f=f+1  ; end;
RUN;

proc print data=test2;

run;



Accepted Solutions
Solution
‎12-20-2011 05:03 PM
Super User
Super User
Posts: 6,500

Re: data step execution

The problem is the ELSE clause.  This can cause the LAG(A) and LAG(B) functions in the second IF statement not to run on every iteration.  So instead of comparing the current value to the immediately preceding value you are comparing it to some value more than one observation earlier.

View solution in original post


All Replies
Solution
‎12-20-2011 05:03 PM
Super User
Super User
Posts: 6,500

Re: data step execution

The problem is the ELSE clause.  This can cause the LAG(A) and LAG(B) functions in the second IF statement not to run on every iteration.  So instead of comparing the current value to the immediately preceding value you are comparing it to some value more than one observation earlier.

Occasional Contributor
Posts: 11

data step execution

Thanks Tom, you are right..

Occasional Contributor
Posts: 11

data step execution

Actually i have one more question. Why the lag function in the if and else part acting diffferently? they are on the same input line?

PROC Star
Posts: 7,363

data step execution

An if statement functions from left to right and stops as soon as its condition is met

Occasional Contributor
Posts: 11

data step execution

in the else part of the if satatement, lag fonctions will pull the value from previous, to determine the previous what value it will check? row number_ obs? _n_? which one?

thanks

PROC Star
Posts: 7,363

data step execution

It functions with a que, using the last value set.  However, the last value is only put into the cue when the statement is called.

Occasional Contributor
Posts: 11

data step execution

thanks for the asnwer, but i still dont get it..  if and else part use same que ? or not? if they use same que then the lag function prodce same result for both part of if staatement.. if they dont use the same que then why? where the que for the else part? how can i reach it?

Super User
Super User
Posts: 6,500

Re: data step execution

You also have trouble with the location of your LAG function calls in the first case.  You have placed if BEFORE the SET statement.

This pushes an extra set of missing values onto the stack so that you end up getting the value from two observations before the call.

516  data test2;

517    before=lag(a);

518    set test;

519    after =lag(a);

520    put (a before after) (=);

521  run;

a=1 before=. after=.

a=2 before=. after=1

a=3 before=1 after=2

a=4 before=2 after=3

a=5 before=3 after=4

a=6 before=4 after=5

a=7 before=5 after=6

Occasional Contributor
Posts: 11

data step execution

this is very suprising for me!! why there is a extra missing step i couldnot understand.. but thanks i will think about it..

Super User
Super User
Posts: 6,500

data step execution

Try putting PUT statements before and after the SET statement to see when the values of the variables from the input dataset change.

Valued Guide
Posts: 765

Re: data step execution

hi ... you can have a LAG function within a conditional statement that does execute in every iteration of the data step if you use an IFN function instead of  your IF-THEN-ELSE statements...

data test3;

retain f 1;

set test;

f = ifn(a ne lag(a) or missing(lag(a)) , 1 , ifn(a eq lag(a) and b gt lag(b) , f+1 , f ));

run;

the above gives the same result as one of your data step, just modified slightly by moving the LAG statements after the SET statement ...

DATA TEST1;

RETAIN   F ( 1) ;

SET TEST;

laga = lag(a); lagb=lag(b);

IF A^=laga OR laga =.  THEN do;    f=1;end;  ELSE IF  A=laga AND b>lagb THEN do; f=f+1  ; end;

RUN;

also fyi ... you don't need the DO-END stuff or the parentheses in the RETAIN and you can use a SUM statement ...

DATA TEST1;

RETAIN F 1 ;

SET TEST;

laga = lag(a);

lagb=lag(b);

IF A^=laga OR laga =. THEN f=1;

ELSE

IF  A=laga AND b>lagb THEN f+1;

RUN;

and ... since in the first pass through the data step, laga is missing, you can skip the RETAIN statement that gives F an intial value of 1 since F will be assigned a 1 in the first pass and the sum statement (F+1) is an "implied RETAIN" for F


ps  you can read more about LAG and IF in Howard Schreier's paper ... "Conditional Lags Don't Have to be Treacherous"

http://www.howles.com/saspapers/CC33.pdf


Occasional Contributor
Posts: 11

data step execution

Thanks for the answer Mike, it was helpful..

But i still searching lag function how determine the previous? if it gets from que why if and else part use different que?

Valued Guide
Posts: 765

data step execution

hi ... it's not that a different part of the queue is used

LAG gives you the value of a variable from the last time the LAG function was executed, NOT the value of the variable in the previous observation (that's what Tom told you)

so ... if you use IF-THEN-ELSE without the statements ...

LAGA = LAG(A);

LAGB = LAG(B);

then LAG(A) and/or LAG(B) may not get executed during each pass through the data step

Occasional Contributor
Posts: 11

data step execution

OK .. i understand the last time execution.. but then i have to ask  last time execution of if part is different from last time execution of else part? since according to to Tom else part is not executed in every iteration? but then there must be a log or something else that with help of it  lag knows from where it will continue? isnt it?

Lets create a scnenario: there is data and if then-else statement and lag function in each part:

Observation 1: if part true  and lag in the if part is executed. lag in the else part is not executed

Observation 2: if part true  and lag in the if part is executed. lag in the else part is not executed

Observation 3: if part false and lag in the if part is not executed. lag in the else part is executed

what is the value of lag in the third observation?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 19 replies
  • 279 views
  • 0 likes
  • 4 in conversation