Quartz | Level 8

## Error using multiple BY groups in DATA step.

Hi,

I am trying to use two BY groups in a data step, and the result I find is baffling.

Here's the code:

``````data want;
input fed_num prov;
cards;
10001 10
10001 10
10001 10
10002 10
10002 10
10003 10
11001 11
11001 11
11002 11
;
run;

proc sort data=want;
by fed_num prov;
run;

data bloppo;
set want;
by fed_num prov;
ff = first.fed_num;
fp = first.prov;
lf = last.fed_num;
lp = last.prov;
run;

proc print data=bloppo;
run;``````

Now, I expect the variable lp (which is equal to last.prov) to be equal to 1 exactly twice: when prov changes from 10 to 11, and at the end of the data.

This is what I get instead:

 fed_num prov ff fp lf lp 10001 10 1 1 0 0 10001 10 0 0 0 0 10001 10 0 0 1 1 10002 10 1 1 0 0 10002 10 0 0 1 1 10003 10 1 1 1 1 11001 11 1 1 0 0 11001 11 0 0 1 1 11002 11 1 1 1 1

lp is identical to lf (which should not be), and it's definitively NOT pointing to the last instance of prov. What is happening here?

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Error using multiple BY groups in DATA step.

If your BY variables are fed_num  and prov then  FIRST.FED_NUM and LAST.FED_NUM are for group of observations that have the same value of FED_NUM.  But FIRST.PROV and LAST.PROV are for the smaller group of observations that have the same value of FED_NUM AND PROV.  Finer and finer partitions of your data.

9 REPLIES 9
Super User

## Re: Error using multiple BY groups in DATA step.

Any change in your first by variable implies a change in subsequent by variables. So first.fed_num will always also set first.prov.

Quartz | Level 8

## Re: Error using multiple BY groups in DATA step.

Hi,

my problem is not with FIRST, it's with LAST. Why does the last.prov flag get set when the value of prov is not changing?
Super User

## Re: Error using multiple BY groups in DATA step.

@gabonzo wrote:
Hi,

my problem is not with FIRST, it's with LAST. Why does the last.prov flag get set when the value of prov is not changing?

Because you also have a last.fednum, and since the group "higher in the hierarchy" ends, all others "below" it will also end.

Super User

## Re: Error using multiple BY groups in DATA step.

BY groups are not independent, they're hierarchical.

If you have two BY groups its the equivalent of creating a unique key by those two variables.
Pyrite | Level 9

## Re: Error using multiple BY groups in DATA step.

This is functioning exactly as it should.

So for the first observation, this is the first time fed_num=10001 and the first time prov=10 so ff=1 and fp=1

On the second observation the values are identical to the first so that would mean that lf and lp would both be 0 for the first obs as well.

Obs fed_num prov ff fp lf lp1

 10001 10 1 1 0 0

Then we look at the second observation... since the values are both identical to the observation above it would get ff=0 and fp=0 and we can see that the third observation is also identical values, so neither fields values are appearing for the first or last time in the table and lf and lp would get 0..

2

 10001 10 0 0 0 0

And then row 3 identical to 1 & 2, so its ff and fp both equal 0. And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's.

The output expectation you have where lf and lp do not match every time, would require there to be variance in the prov values while the fed_num remained the same.

Example:

``````data want2;
input fed_num prov;
cards;
10001 10
10001 11
10001 12
10002 10
10002 11
10003 10
11001 11
11001 12
11002 11
;
run;

proc sort data=want2;  by fed_num prov; run;

data bloppo2;
set want2;
by fed_num prov;
ff = first.fed_num;
fp = first.prov;
lf = last.fed_num;
lp = last.prov;
run;
proc print data=bloppo2; run;``````

Which would result in this:

Obs fed_num prov ff fp lf lp123456789

 10001 10 1 1 0 1 10001 11 0 1 0 1 10001 12 0 1 1 1 10002 10 1 1 0 1 10002 11 0 1 1 1 10003 10 1 1 1 1 11001 11 1 1 0 1 11001 12 0 1 1 1 11002 11 1 1 1 1

Which I think is more along the lines of what you were expecting.

Hopefully that helped.

Quartz | Level 8

## Re: Error using multiple BY groups in DATA step.

Hi,

you wrote:

"And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's."

On observation 4, fed_num changes, but prov does not change. So why does last.prov equal 1?
Pyrite | Level 9

## Re: Error using multiple BY groups in DATA step.

@gabonzo wrote:
Hi,

you wrote:

"And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's."

On observation 4, fed_num changes, but prov does not change. So why does last.prov equal 1?

Just so there is no confusion, my observation by observation explanation was regarding your data that was presented in the initial post. Having said that, your comment does not make sense, as observation 4 does not have a last.prov value of 1.

 fed_num prov ff fp lf lp 10001 10 1 1 0 0 10001 10 0 0 0 0 10001 10 0 0 1 1 10002 10 1 1 0 0 10002 10 0 0 1 1 10003 10 1 1 1 1 11001 11 1 1 0 0 11001 11 0 0 1 1 11002 11 1 1 1 1

Now if you were looking at the output I created using the alternate dummy data, the first/last values would be set like this:

Is this the first time fed_num=10002? Yes....                              THEN FF=1

Is this the first time prov=10 ( when fed_num=10002)? Yes....   THEN FP=1

Is this the last time fed_num=10002? No.....                               THEN LF=0

Is this the last time prov=10 ( when fed_num=10002)? Yes...     THEN LP=1

Since you are using the 2 'BY Groups' the second 'BY Group' variable's first/last values are evaluated within each unique value of the first 'BY Group'.

 Obs fed_num prov ff fp lf lp 1 10001 10 1 1 0 1 2 10001 11 0 1 0 1 3 10001 12 0 1 1 1 4 10002 10 1 1 0 1 5 10002 11 0 1 1 1 6 10003 10 1 1 1 1 7 11001 11 1 1 0 1 8 11001 12 0 1 1 1 9 11002 11 1 1 1 1
Super User

## Re: Error using multiple BY groups in DATA step.

If your BY variables are fed_num  and prov then  FIRST.FED_NUM and LAST.FED_NUM are for group of observations that have the same value of FED_NUM.  But FIRST.PROV and LAST.PROV are for the smaller group of observations that have the same value of FED_NUM AND PROV.  Finer and finer partitions of your data.

Quartz | Level 8

## Re: Error using multiple BY groups in DATA step.

Ok, it means they are nested. Good to know.

So I should change the order and use

BY PROV FED_NUM;

Thank you.
Discussion stats
• 9 replies
• 777 views
• 7 likes
• 5 in conversation