BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8

I come across 

data pd4;
set pb3;
by region product sub_product;
if first.product then do;
cum_active=0;
cum_default=0;
end;
cum_active+active;
cum_default+Default;
cum_active_percent = cum_active / total_active;
cum_default_percent=cum_Default / total_Default;
run;


Hi I read the doc and couldn't understand the significance of assigning first, last to 1,0 etc in the above programs. Could you kindly explain?

Aso could you explain the statements

 


cum_active+active; 
cum_default+Default;

are they assigning values to some variable? are they boolean? what are they doing?

 

3 REPLIES 3
japelin
Rhodochrosite | Level 12
if first.product then do;
  cum_active=0;
  cum_default=0;
end;

This is Initializing variables to be used for the sum statement.

 

cum_active+active; 
cum_default+Default;

Thereafter, the total is calculated.

 

For more information on the sum statement, 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/n1dfiqj146yi2cn1maeju9wo7ijs.htm

 

First data step variables are described here
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lrcon/n01a08zkzy5igbn173zjz82zsi1s.htm

 

Also, I have a few requests.
Please do not post the text mixed in with the code block as it makes it difficult to see.
Also, if necessary, please make questions that you have already posted resolved.

 

Rick_SAS
SAS Super FREQ

For information on using the FIRST and LAST indicator variables in a BY-group analysis, see "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."

mkeintz
PROC Star

The first. and last. automatic variables generated via the BY statement are just indicators about the current observation related to the preceding and following observations.   When first.x=1, it means the current obs has a different value for x than the previous obs - otherwise first.x=0.. Similarly the last.x means the current value of x differs from the next obs  (i.e. SAS does a look-ahead for you).   Now, it is possible for a record to have both first.x=1 and last.x=1 (the current obs is a singleton) or first.x=0 and last.x=0 (current obs is in the middle of a stream of constant x values).

 

Now when you have BY a b c d. there will be multiple first. and multiple last. automatic variables.  The (slightly) non-intuitive behavior of these is that if, in the middle of a series of constant B values, you have a change in A, both first.A and first.B are set to 1, i.e. whenever a first. variable becomes 1, all the first.'s for variables to its right also become 1.   Same with a set of last. variables.

 

So let me revise my first statement: whenever first.x=1, it means that the current x differs from the preceding x or the current value of some earlier variable in a by list differs from its predecessor.

 

Editted addition: the above refers to use of by statement in a DATA step.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 935 views
  • 1 like
  • 4 in conversation