BookmarkSubscribeRSS Feed
SAIKDE
Calcite | Level 5

data a;
input clm clmv;
cards;
0 10
100 10
200 10
300 10
600 10
400 20
500 20
1000 20
2000 20
3000 20
4000 30
5000 30
;
run;

data b;
value = first.clmv;

set a;
by clmv;
value1 = first.clmv;
run;

 

 

My question is: why is there a difference  when I write "first." before the set statement and after the set statement? Can anyone explain that?

3 REPLIES 3
Rick_SAS
SAS Super FREQ

The correct usage is to put any statements that refer to variables AFTER the set statement. Otherwise, they are undefined (missing) when first encountered.

 

In your program, the VALUE1 variable is correct. The VALUE variable is looking at the PREVIOUS observation to determine the value of first.clmv, except for the first time, which sees missing values.

 

If this is not clear, look at the following simpler example. Do you see how the VALUE variable is missing the first time and then has the value from the PREVIOUS observation? In symbols, VALUE=LAG(VALUE1).

data c;
value = clm + clmv;
set a;
by clmv;
value1 = clm + clmv;
run;

proc print;run;
Kurt_Bremser
Super User

The SET statement has a double nature:

  • it is declarative; it directs the data step compiler to read the datasets metadata and set up the PDV
  • it is imperative, as it will perform the "read" during data step execution

And it is the "read" which populates dependent variables like your FIRST.

FreelanceReinh
Jade | Level 19

Hello @SAIKDE,

 

The values of variable value1 are explained in section "How SAS Identifies the Beginning and End of a BY Group" of the documentation of the BY statement:

SAS sets the value of FIRST.variable to 1 when it reads the first observation in a BY group

and in subsection "How SAS Determines FIRST.variable and LAST.variable" of the section "FIRST. and LAST. DATA Step Variables" in the documentation "BY-Group Processing in the DATA Step":

  • For all other observations in the BY group, the value of FIRST.variable is 0.

It is the SET statement which reads the observations, so only after the execution of the SET statement variable first.clmv is updated for the current observation.

 

The values of variable value are explained by the facts that variables first.variable and last.variable are

  • initialized to 1 at the beginning of the DATA step
  • automatically retained (i.e., not set to missing when the DATA step iterates)

(which seem to be not clearly stated in the documentation linked above).

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1169 views
  • 0 likes
  • 4 in conversation