data a;
input clm clmv;
cards;
0 10
100 10
200 10
300 10
600 10
400 20
500 20
1000 20
2000 20
3000 20
4000 30
5000 30
;
run;
data b;
value = first.clmv;
set a;
by clmv;
value1 = first.clmv;
run;
My question is: why is there a difference when I write "first." before the set statement and after the set statement? Can anyone explain that?
The correct usage is to put any statements that refer to variables AFTER the set statement. Otherwise, they are undefined (missing) when first encountered.
In your program, the VALUE1 variable is correct. The VALUE variable is looking at the PREVIOUS observation to determine the value of first.clmv, except for the first time, which sees missing values.
If this is not clear, look at the following simpler example. Do you see how the VALUE variable is missing the first time and then has the value from the PREVIOUS observation? In symbols, VALUE=LAG(VALUE1).
data c;
value = clm + clmv;
set a;
by clmv;
value1 = clm + clmv;
run;
proc print;run;
The SET statement has a double nature:
And it is the "read" which populates dependent variables like your FIRST.
Hello @SAIKDE,
The values of variable value1 are explained in section "How SAS Identifies the Beginning and End of a BY Group" of the documentation of the BY statement:
SAS sets the value of FIRST.variable to 1 when it reads the first observation in a BY group
and in subsection "How SAS Determines FIRST.variable and LAST.variable" of the section "FIRST. and LAST. DATA Step Variables" in the documentation "BY-Group Processing in the DATA Step":
- For all other observations in the BY group, the value of FIRST.variable is 0.
It is the SET statement which reads the observations, so only after the execution of the SET statement variable first.clmv is updated for the current observation.
The values of variable value are explained by the facts that variables first.variable and last.variable are
(which seem to be not clearly stated in the documentation linked above).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.