BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hellohere
Pyrite | Level 9

Code below. The DIFn return the dif value by GRP. But what I want is simple dif value by _N_.

 

How to do it?! 

 

data temp;
do i=1 to 200;
grp=floor((i-1)/10);
output;
end;
run;quit;

data temp2;
set temp;
if i-floor(i/10)*10=5 then do;
_d=dif4(i);
end;
run;quit;

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

The DIF function, like the LAG function is a queue update function.   In your IF statement

 

  if    i-floor(i/10)*10=5 then _d=dif4(i);

The queue is only updated at obs 5,15,15,25, 35,45,55,65,75,85, 95,.....,195.   The first 4 times (5,15,...,35) the queue is updated the DIF4 function will return a missing value, because the queue needs 5 instances to fully populate with non-missing values.  Note that the queue is NOT updated for all the other observations.

 

What you really want to do is update the queue at every observation, but keep the dif4 result only for obs 5,15,25,...195.  This is what I think you want.

 

data temp;
  do i=1 to 200;
    grp=floor((i-1)/10);
    output;
  end;
run;
quit;

data temp2;
  set temp;
  _d=ifn(i-floor(i/10)*10=55,dif4(i),.);
run;

This will return a value of 4 for obs 5,15,,25, etc., and a missing value otherwise.  That is because, unlike the IF statement, the IFN function always executes both the 2nd and 3rd arguments, regardless of which one it returns when assessing whether the first argument is true.  (It returns the 2nd arg when the first arg is true, otherwise it returns the 3rd arg).  

 

Because the IFN function will update the DIF4 queue for every obs (not just every 10th obs as in the IF statement) you will be subtracting the 4th previous obs (not the 40th preceding obs) from the current obs but only keeping the result for every 10th obs.

 

There are times when it is a good idea to put a DIF or LAG function in an IF statement.  But this is not such a time.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
hellohere
Pyrite | Level 9

_d shows up only on _N_=45/55/65... with value of 40.

 

What expected is to show up at 5/15/25... with value of 4

pink_poodle
Barite | Level 11

Could you please provide more details?:

*** USES i TO CALCULATE GROUPS ***;
DATA TEMP; 
      DO i = 1 TO 200;
              GROUP =  FLOOR( (i-1)/10 ); * max(GROUP) is 19;
              OUTPUT;
       END:
RUN:

*** WHAT DOES THIS STEP DO? ***;
* not using the GROUP variable;
DATA TEMP2;
      SET TEMP;
      IF i - FLOOR (i/10) * 10= 5 THEN DO;  
      * i - FLOOR (i/10) = 0.5? ;
               _D = DIF4( i ); * what is DIF4( )? ; 
      END;
RUN;
hellohere
Pyrite | Level 9

create a column _d, based on condition [here mod(i,10)], keep DIFn on i[n changes on mod(i,10)].

Say mod(i,10)=5 to get DIF4(i), mod(i,10)=8 to get DIF3(i)... otherwise _D=.;

mkeintz
PROC Star

The DIF function, like the LAG function is a queue update function.   In your IF statement

 

  if    i-floor(i/10)*10=5 then _d=dif4(i);

The queue is only updated at obs 5,15,15,25, 35,45,55,65,75,85, 95,.....,195.   The first 4 times (5,15,...,35) the queue is updated the DIF4 function will return a missing value, because the queue needs 5 instances to fully populate with non-missing values.  Note that the queue is NOT updated for all the other observations.

 

What you really want to do is update the queue at every observation, but keep the dif4 result only for obs 5,15,25,...195.  This is what I think you want.

 

data temp;
  do i=1 to 200;
    grp=floor((i-1)/10);
    output;
  end;
run;
quit;

data temp2;
  set temp;
  _d=ifn(i-floor(i/10)*10=55,dif4(i),.);
run;

This will return a value of 4 for obs 5,15,,25, etc., and a missing value otherwise.  That is because, unlike the IF statement, the IFN function always executes both the 2nd and 3rd arguments, regardless of which one it returns when assessing whether the first argument is true.  (It returns the 2nd arg when the first arg is true, otherwise it returns the 3rd arg).  

 

Because the IFN function will update the DIF4 queue for every obs (not just every 10th obs as in the IF statement) you will be subtracting the 4th previous obs (not the 40th preceding obs) from the current obs but only keeping the result for every 10th obs.

 

There are times when it is a good idea to put a DIF or LAG function in an IF statement.  But this is not such a time.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 579 views
  • 0 likes
  • 3 in conversation