BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_inquisitive
Lapis Lazuli | Level 10

Hello, I wonder what does first.var1 represent in this code?

 

proc sort data = have;
	by var1 var2 var3;
run;

data want;
	set have;
	by var1 var2 var3;

	if first.var1;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi:

  You have to read about BY group processing in the DATA step. FIRST. and LAST. variables are automatic variables that are "turned on" when you use BY group processing in the DATA step. These variables are not automatically written to the output dataset. So you have to capture the values, if you want to examine them.

 

  Basically, when the value of FIRST.byvar is 1 then that means the current row is the first of the BYGROUP and if the value of FIRST.byvar is 0 it means that the current row is NOT the first of the group. And, similarly, when LAST.byvar=1 the current row is the last of the group and when LAST.byvar=0 the current row is NOT the last of the group.

 

  This is easier to see with some actual data. If you capture the values for some data, as shown in the program below. I modified your program in order to create 2 datasets -- one that shows ALL the rows in the original input data and the other than is output after the subsetting IF statement:

data have;
  infile datalines;
  input var1 $ var2 $ var3 $;
datalines;
aaa 11a  111
aaa 11a  222
aaa 11a  222
bbb 11b  333
bbb 11b  444
ccc 22a  111
ccc 22a  222
ccc 22b  333
;
run;
  
proc sort data = have;
   by var1 var2 var3;
run;

data wantall wantonlyfirstvar1 ;
  set have;
  by var1 var2 var3;
  first_var1 = first.var1;
  last_var1 = last.var1;
  first_var2 = first.var2;
  last_var2 = last.var2;
  first_var3 = first.var3;
  last_var3 = last.var3;
  output wantall;
  if first.var1;
  output wantonlyfirstvar1;
run;
  
proc print data=wantall;
  title 'show all values for first. and last. automatic variables for all rows in original HAVE dataset';
run;
   
proc print data=wantonlyfirstvar1;
  title 'show result of using first.var1 subsetting if to see why only get 3 obs in this output dataset';
run;
title;

  Of course, the second half of your question is implied and has to do with this statement:

   if first.var1;

 

  This is a subsetting IF statement that acts like a gate to let observations pass or not pass to the rest of the logic in the program. In this instance the subsetting if is controlling what will be output. So, based on the sample data above, there are only 3 rows where the value of FIRST.VAR1=1, so the "gate" is allowing only those 3 rows to pass to the end of the program where they will be output to the final dataset (my program has EXPLICIT OUTPUT statements so I can create 2 datasets; your program has an IMPLICIT output, which means the subsetting IF will implicitly cause an output to your dataset WANT).

 

  Here's the output from the above program.

 

cynthia

 

** Output;

first_byvar.png

View solution in original post

4 REPLIES 4
Astounding
PROC Star

This is a "must have" tool if you're going to program using SAS.

 

first.var is created by the BY statement in the DATA step.  It is automatically 1 or 0.  As the data step progresses through the incoming data, whenever VAR1 takes on a new value, first.var1 is 1.  Otherwise, first.var1 is 0.

 

The IF statement considers 1 to be true, and 0 to be false.  So the DATA step is selecting the first observation for each value of VAR1.

 

The final basic:  To be allowed to use a BY statement in a DATA step, your observations have to be in order.  Usually that means running PROC SORT first, but if the observations are already in order for any reason, you don't have to use PROC SORT on top of that.

 

These are the basics only, but definitely enough to get started.

Cynthia_sas
SAS Super FREQ

Hi:

  You have to read about BY group processing in the DATA step. FIRST. and LAST. variables are automatic variables that are "turned on" when you use BY group processing in the DATA step. These variables are not automatically written to the output dataset. So you have to capture the values, if you want to examine them.

 

  Basically, when the value of FIRST.byvar is 1 then that means the current row is the first of the BYGROUP and if the value of FIRST.byvar is 0 it means that the current row is NOT the first of the group. And, similarly, when LAST.byvar=1 the current row is the last of the group and when LAST.byvar=0 the current row is NOT the last of the group.

 

  This is easier to see with some actual data. If you capture the values for some data, as shown in the program below. I modified your program in order to create 2 datasets -- one that shows ALL the rows in the original input data and the other than is output after the subsetting IF statement:

data have;
  infile datalines;
  input var1 $ var2 $ var3 $;
datalines;
aaa 11a  111
aaa 11a  222
aaa 11a  222
bbb 11b  333
bbb 11b  444
ccc 22a  111
ccc 22a  222
ccc 22b  333
;
run;
  
proc sort data = have;
   by var1 var2 var3;
run;

data wantall wantonlyfirstvar1 ;
  set have;
  by var1 var2 var3;
  first_var1 = first.var1;
  last_var1 = last.var1;
  first_var2 = first.var2;
  last_var2 = last.var2;
  first_var3 = first.var3;
  last_var3 = last.var3;
  output wantall;
  if first.var1;
  output wantonlyfirstvar1;
run;
  
proc print data=wantall;
  title 'show all values for first. and last. automatic variables for all rows in original HAVE dataset';
run;
   
proc print data=wantonlyfirstvar1;
  title 'show result of using first.var1 subsetting if to see why only get 3 obs in this output dataset';
run;
title;

  Of course, the second half of your question is implied and has to do with this statement:

   if first.var1;

 

  This is a subsetting IF statement that acts like a gate to let observations pass or not pass to the rest of the logic in the program. In this instance the subsetting if is controlling what will be output. So, based on the sample data above, there are only 3 rows where the value of FIRST.VAR1=1, so the "gate" is allowing only those 3 rows to pass to the end of the program where they will be output to the final dataset (my program has EXPLICIT OUTPUT statements so I can create 2 datasets; your program has an IMPLICIT output, which means the subsetting IF will implicitly cause an output to your dataset WANT).

 

  Here's the output from the above program.

 

cynthia

 

** Output;

first_byvar.png

SAS_inquisitive
Lapis Lazuli | Level 10

Thank you.  Does the order of by variables matter as below? Here can we say we want  the first occurence of  by var1  for each  by var2?

 

proc sort data = have;
	by var2 var1 var3;
run;

data want;
	set have;
	by var2 var1 var3;

	if first.var1;
run;

 

Reeza
Super User

Try it.

 

Create a variable that holds the first.var2 variable and explore how it changes as you change your BY groupings.

 

proc sort data = have;
   by var2 var1 var3;
run;
 
data want;
  set have;
  by var2 var1 var3;
  first_var1=first.var1;
run;

proc print data=want;
var var2 var1 first_var1;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3779 views
  • 6 likes
  • 4 in conversation