BookmarkSubscribeRSS Feed
Calcite | Level 5

data a;
input clm clmv;
0 10
100 10
200 10
300 10
600 10
400 20
500 20
1000 20
2000 20
3000 20
4000 30
5000 30

data b;
value = first.clmv;

set a;
by clmv;
value1 = first.clmv;



My question is: why is there a difference  when I write "first." before the set statement and after the set statement? Can anyone explain that?


The correct usage is to put any statements that refer to variables AFTER the set statement. Otherwise, they are undefined (missing) when first encountered.


In your program, the VALUE1 variable is correct. The VALUE variable is looking at the PREVIOUS observation to determine the value of first.clmv, except for the first time, which sees missing values.


If this is not clear, look at the following simpler example. Do you see how the VALUE variable is missing the first time and then has the value from the PREVIOUS observation? In symbols, VALUE=LAG(VALUE1).

data c;
value = clm + clmv;
set a;
by clmv;
value1 = clm + clmv;

proc print;run;
Super User

The SET statement has a double nature:

  • it is declarative; it directs the data step compiler to read the datasets metadata and set up the PDV
  • it is imperative, as it will perform the "read" during data step execution

And it is the "read" which populates dependent variables like your FIRST.

Jade | Level 19

Hello @SAIKDE,


The values of variable value1 are explained in section "How SAS Identifies the Beginning and End of a BY Group" of the documentation of the BY statement:

SAS sets the value of FIRST.variable to 1 when it reads the first observation in a BY group

and in subsection "How SAS Determines FIRST.variable and LAST.variable" of the section "FIRST. and LAST. DATA Step Variables" in the documentation "BY-Group Processing in the DATA Step":

  • For all other observations in the BY group, the value of FIRST.variable is 0.

It is the SET statement which reads the observations, so only after the execution of the SET statement variable first.clmv is updated for the current observation.


The values of variable value are explained by the facts that variables first.variable and last.variable are

  • initialized to 1 at the beginning of the DATA step
  • automatically retained (i.e., not set to missing when the DATA step iterates)

(which seem to be not clearly stated in the documentation linked above).



Time is running out to save with the early bird rate. Register by Friday, March 1 for just $695 - $100 off the standard rate.


Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4 in conversation