BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hellohere
Pyrite | Level 9

Code below. The DIFn return the dif value by GRP. But what I want is simple dif value by _N_.

 

How to do it?! 

 

data temp;
do i=1 to 200;
grp=floor((i-1)/10);
output;
end;
run;quit;

data temp2;
set temp;
if i-floor(i/10)*10=5 then do;
_d=dif4(i);
end;
run;quit;

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

The DIF function, like the LAG function is a queue update function.   In your IF statement

 

  if    i-floor(i/10)*10=5 then _d=dif4(i);

The queue is only updated at obs 5,15,15,25, 35,45,55,65,75,85, 95,.....,195.   The first 4 times (5,15,...,35) the queue is updated the DIF4 function will return a missing value, because the queue needs 5 instances to fully populate with non-missing values.  Note that the queue is NOT updated for all the other observations.

 

What you really want to do is update the queue at every observation, but keep the dif4 result only for obs 5,15,25,...195.  This is what I think you want.

 

data temp;
  do i=1 to 200;
    grp=floor((i-1)/10);
    output;
  end;
run;
quit;

data temp2;
  set temp;
  _d=ifn(i-floor(i/10)*10=55,dif4(i),.);
run;

This will return a value of 4 for obs 5,15,,25, etc., and a missing value otherwise.  That is because, unlike the IF statement, the IFN function always executes both the 2nd and 3rd arguments, regardless of which one it returns when assessing whether the first argument is true.  (It returns the 2nd arg when the first arg is true, otherwise it returns the 3rd arg).  

 

Because the IFN function will update the DIF4 queue for every obs (not just every 10th obs as in the IF statement) you will be subtracting the 4th previous obs (not the 40th preceding obs) from the current obs but only keeping the result for every 10th obs.

 

There are times when it is a good idea to put a DIF or LAG function in an IF statement.  But this is not such a time.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
hellohere
Pyrite | Level 9

_d shows up only on _N_=45/55/65... with value of 40.

 

What expected is to show up at 5/15/25... with value of 4

pink_poodle
Barite | Level 11

Could you please provide more details?:

*** USES i TO CALCULATE GROUPS ***;
DATA TEMP; 
      DO i = 1 TO 200;
              GROUP =  FLOOR( (i-1)/10 ); * max(GROUP) is 19;
              OUTPUT;
       END:
RUN:

*** WHAT DOES THIS STEP DO? ***;
* not using the GROUP variable;
DATA TEMP2;
      SET TEMP;
      IF i - FLOOR (i/10) * 10= 5 THEN DO;  
      * i - FLOOR (i/10) = 0.5? ;
               _D = DIF4( i ); * what is DIF4( )? ; 
      END;
RUN;
hellohere
Pyrite | Level 9

create a column _d, based on condition [here mod(i,10)], keep DIFn on i[n changes on mod(i,10)].

Say mod(i,10)=5 to get DIF4(i), mod(i,10)=8 to get DIF3(i)... otherwise _D=.;

mkeintz
PROC Star

The DIF function, like the LAG function is a queue update function.   In your IF statement

 

  if    i-floor(i/10)*10=5 then _d=dif4(i);

The queue is only updated at obs 5,15,15,25, 35,45,55,65,75,85, 95,.....,195.   The first 4 times (5,15,...,35) the queue is updated the DIF4 function will return a missing value, because the queue needs 5 instances to fully populate with non-missing values.  Note that the queue is NOT updated for all the other observations.

 

What you really want to do is update the queue at every observation, but keep the dif4 result only for obs 5,15,25,...195.  This is what I think you want.

 

data temp;
  do i=1 to 200;
    grp=floor((i-1)/10);
    output;
  end;
run;
quit;

data temp2;
  set temp;
  _d=ifn(i-floor(i/10)*10=55,dif4(i),.);
run;

This will return a value of 4 for obs 5,15,,25, etc., and a missing value otherwise.  That is because, unlike the IF statement, the IFN function always executes both the 2nd and 3rd arguments, regardless of which one it returns when assessing whether the first argument is true.  (It returns the 2nd arg when the first arg is true, otherwise it returns the 3rd arg).  

 

Because the IFN function will update the DIF4 queue for every obs (not just every 10th obs as in the IF statement) you will be subtracting the 4th previous obs (not the 40th preceding obs) from the current obs but only keeping the result for every 10th obs.

 

There are times when it is a good idea to put a DIF or LAG function in an IF statement.  But this is not such a time.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 398 views
  • 0 likes
  • 3 in conversation