Hi,
I am trying to use two BY groups in a data step, and the result I find is baffling.
Here's the code:
data want;
input fed_num prov;
cards;
10001 10
10001 10
10001 10
10002 10
10002 10
10003 10
11001 11
11001 11
11002 11
;
run;
proc sort data=want;
by fed_num prov;
run;
data bloppo;
set want;
by fed_num prov;
ff = first.fed_num;
fp = first.prov;
lf = last.fed_num;
lp = last.prov;
run;
proc print data=bloppo;
run;
Now, I expect the variable lp (which is equal to last.prov) to be equal to 1 exactly twice: when prov changes from 10 to 11, and at the end of the data.
This is what I get instead:
fed_num | prov | ff | fp | lf | lp |
10001 | 10 | 1 | 1 | 0 | 0 |
10001 | 10 | 0 | 0 | 0 | 0 |
10001 | 10 | 0 | 0 | 1 | 1 |
10002 | 10 | 1 | 1 | 0 | 0 |
10002 | 10 | 0 | 0 | 1 | 1 |
10003 | 10 | 1 | 1 | 1 | 1 |
11001 | 11 | 1 | 1 | 0 | 0 |
11001 | 11 | 0 | 0 | 1 | 1 |
11002 | 11 | 1 | 1 | 1 | 1 |
lp is identical to lf (which should not be), and it's definitively NOT pointing to the last instance of prov. What is happening here?
If your BY variables are fed_num and prov then FIRST.FED_NUM and LAST.FED_NUM are for group of observations that have the same value of FED_NUM. But FIRST.PROV and LAST.PROV are for the smaller group of observations that have the same value of FED_NUM AND PROV. Finer and finer partitions of your data.
Any change in your first by variable implies a change in subsequent by variables. So first.fed_num will always also set first.prov.
@gabonzo wrote:
Hi,
my problem is not with FIRST, it's with LAST. Why does the last.prov flag get set when the value of prov is not changing?
Because you also have a last.fednum, and since the group "higher in the hierarchy" ends, all others "below" it will also end.
This is functioning exactly as it should.
So for the first observation, this is the first time fed_num=10001 and the first time prov=10 so ff=1 and fp=1
On the second observation the values are identical to the first so that would mean that lf and lp would both be 0 for the first obs as well.
Obs fed_num prov ff fp lf lp1
10001 | 10 | 1 | 1 | 0 | 0 |
Then we look at the second observation... since the values are both identical to the observation above it would get ff=0 and fp=0 and we can see that the third observation is also identical values, so neither fields values are appearing for the first or last time in the table and lf and lp would get 0..
2
10001 | 10 | 0 | 0 | 0 | 0 |
And then row 3 identical to 1 & 2, so its ff and fp both equal 0. And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's.
The output expectation you have where lf and lp do not match every time, would require there to be variance in the prov values while the fed_num remained the same.
Example:
data want2;
input fed_num prov;
cards;
10001 10
10001 11
10001 12
10002 10
10002 11
10003 10
11001 11
11001 12
11002 11
;
run;
proc sort data=want2; by fed_num prov; run;
data bloppo2;
set want2;
by fed_num prov;
ff = first.fed_num;
fp = first.prov;
lf = last.fed_num;
lp = last.prov;
run;
proc print data=bloppo2; run;
Which would result in this:
Obs fed_num prov ff fp lf lp123456789
10001 | 10 | 1 | 1 | 0 | 1 |
10001 | 11 | 0 | 1 | 0 | 1 |
10001 | 12 | 0 | 1 | 1 | 1 |
10002 | 10 | 1 | 1 | 0 | 1 |
10002 | 11 | 0 | 1 | 1 | 1 |
10003 | 10 | 1 | 1 | 1 | 1 |
11001 | 11 | 1 | 1 | 0 | 1 |
11001 | 12 | 0 | 1 | 1 | 1 |
11002 | 11 | 1 | 1 | 1 | 1 |
Which I think is more along the lines of what you were expecting.
Hopefully that helped.
@gabonzo wrote:
Hi,
you wrote:
"And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's."
Please, look again.
On observation 4, fed_num changes, but prov does not change. So why does last.prov equal 1?
Just so there is no confusion, my observation by observation explanation was regarding your data that was presented in the initial post. Having said that, your comment does not make sense, as observation 4 does not have a last.prov value of 1.
fed_num | prov | ff | fp | lf | lp |
10001 | 10 | 1 | 1 | 0 | 0 |
10001 | 10 | 0 | 0 | 0 | 0 |
10001 | 10 | 0 | 0 | 1 | 1 |
10002 | 10 | 1 | 1 | 0 | 0 |
10002 | 10 | 0 | 0 | 1 | 1 |
10003 | 10 | 1 | 1 | 1 | 1 |
11001 | 11 | 1 | 1 | 0 | 0 |
11001 | 11 | 0 | 0 | 1 | 1 |
11002 | 11 | 1 | 1 | 1 | 1 |
Now if you were looking at the output I created using the alternate dummy data, the first/last values would be set like this:
Is this the first time fed_num=10002? Yes.... THEN FF=1
Is this the first time prov=10 ( when fed_num=10002)? Yes.... THEN FP=1
Is this the last time fed_num=10002? No..... THEN LF=0
Is this the last time prov=10 ( when fed_num=10002)? Yes... THEN LP=1
Since you are using the 2 'BY Groups' the second 'BY Group' variable's first/last values are evaluated within each unique value of the first 'BY Group'.
Obs | fed_num | prov | ff | fp | lf | lp |
1 | 10001 | 10 | 1 | 1 | 0 | 1 |
2 | 10001 | 11 | 0 | 1 | 0 | 1 |
3 | 10001 | 12 | 0 | 1 | 1 | 1 |
4 | 10002 | 10 | 1 | 1 | 0 | 1 |
5 | 10002 | 11 | 0 | 1 | 1 | 1 |
6 | 10003 | 10 | 1 | 1 | 1 | 1 |
7 | 11001 | 11 | 1 | 1 | 0 | 1 |
8 | 11001 | 12 | 0 | 1 | 1 | 1 |
9 | 11002 | 11 | 1 | 1 | 1 | 1 |
If your BY variables are fed_num and prov then FIRST.FED_NUM and LAST.FED_NUM are for group of observations that have the same value of FED_NUM. But FIRST.PROV and LAST.PROV are for the smaller group of observations that have the same value of FED_NUM AND PROV. Finer and finer partitions of your data.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.