BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gabonzo
Quartz | Level 8

Hi,

 

I am trying to use two BY groups in a data step, and the result I find is baffling.

Here's the code:

 

data want;
	input fed_num prov;
	cards;
	10001 10
	10001 10
	10001 10
	10002 10
	10002 10
	10003 10
	11001 11
	11001 11
	11002 11
	;
run;

proc sort data=want;
	by fed_num prov;
run;

data bloppo;
	set want;
	by fed_num prov;
	ff = first.fed_num;
	fp = first.prov;
	lf = last.fed_num;
	lp = last.prov;
run;

proc print data=bloppo;
run;

 

Now, I expect the variable lp (which is equal to last.prov) to be equal to 1 exactly twice: when prov changes from 10 to 11, and at the end of the data.

 

This is what I get instead:

 


fed_numprovfffplflp
10001101100
10001100000
10001100011
10002101100
10002100011
10003101111
11001111100
11001110011
11002111111

 

lp is identical to lf (which should not be), and it's definitively NOT pointing to the last instance of prov. What is happening here?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If your BY variables are fed_num  and prov then  FIRST.FED_NUM and LAST.FED_NUM are for group of observations that have the same value of FED_NUM.  But FIRST.PROV and LAST.PROV are for the smaller group of observations that have the same value of FED_NUM AND PROV.  Finer and finer partitions of your data.

View solution in original post

9 REPLIES 9
gabonzo
Quartz | Level 8
Hi,

my problem is not with FIRST, it's with LAST. Why does the last.prov flag get set when the value of prov is not changing?
Kurt_Bremser
Super User

@gabonzo wrote:
Hi,

my problem is not with FIRST, it's with LAST. Why does the last.prov flag get set when the value of prov is not changing?

Because you also have a last.fednum, and since the group "higher in the hierarchy" ends, all others "below" it will also end.

Reeza
Super User
BY groups are not independent, they're hierarchical.

If you have two BY groups its the equivalent of creating a unique key by those two variables.
tsap
Pyrite | Level 9

This is functioning exactly as it should.

 

So for the first observation, this is the first time fed_num=10001 and the first time prov=10 so ff=1 and fp=1

 

On the second observation the values are identical to the first so that would mean that lf and lp would both be 0 for the first obs as well.

Obs fed_num prov ff fp lf lp1

10001101100

 

Then we look at the second observation... since the values are both identical to the observation above it would get ff=0 and fp=0 and we can see that the third observation is also identical values, so neither fields values are appearing for the first or last time in the table and lf and lp would get 0..

2

10001100000

 

And then row 3 identical to 1 & 2, so its ff and fp both equal 0. And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's.

 

The output expectation you have where lf and lp do not match every time, would require there to be variance in the prov values while the fed_num remained the same.

 

Example:

data want2;
	input fed_num prov;
	cards;
	10001 10
	10001 11
	10001 12
	10002 10
	10002 11
	10003 10
	11001 11
	11001 12
	11002 11
	;
run;

proc sort data=want2;  by fed_num prov; run;

data bloppo2;
	set want2;
	by fed_num prov;
	ff = first.fed_num;
	fp = first.prov;
	lf = last.fed_num;
	lp = last.prov;
run;
proc print data=bloppo2; run;

 

Which would result in this:

Obs fed_num prov ff fp lf lp123456789

10001101101
10001110101
10001120111
10002101101
10002110111
10003101111
11001111101
11001120111
11002111111

 

Which I think is more along the lines of what you were expecting.

 

Hopefully that helped.

gabonzo
Quartz | Level 8
Hi,

you wrote:

"And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's."

Please, look again.

On observation 4, fed_num changes, but prov does not change. So why does last.prov equal 1?
tsap
Pyrite | Level 9

@gabonzo wrote:
Hi,

you wrote:

"And since both fed_num and nov change in obs 4, we know this is the last row that will contain these specific values so lf and lp will both get 1's."

Please, look again.

On observation 4, fed_num changes, but prov does not change. So why does last.prov equal 1?

 

Just so there is no confusion, my observation by observation explanation was regarding your data that was presented in the initial post. Having said that, your comment does not make sense, as observation 4 does not have a last.prov value of 1.

 

fed_num
provfffplflp
10001101100
10001100000
10001100011
10002101100
10002100011
10003101111
11001111100
11001110011
11002111111

 

 

Now if you were looking at the output I created using the alternate dummy data, the first/last values would be set like this:

 

Is this the first time fed_num=10002? Yes....                              THEN FF=1

Is this the first time prov=10 ( when fed_num=10002)? Yes....   THEN FP=1

Is this the last time fed_num=10002? No.....                               THEN LF=0

Is this the last time prov=10 ( when fed_num=10002)? Yes...     THEN LP=1

Since you are using the 2 'BY Groups' the second 'BY Group' variable's first/last values are evaluated within each unique value of the first 'BY Group'.

Obsfed_numprovfffplflp
110001101101
210001110101
310001120111
410002101101
510002110111
610003101111
711001111101
811001120111
911002111111
Tom
Super User Tom
Super User

If your BY variables are fed_num  and prov then  FIRST.FED_NUM and LAST.FED_NUM are for group of observations that have the same value of FED_NUM.  But FIRST.PROV and LAST.PROV are for the smaller group of observations that have the same value of FED_NUM AND PROV.  Finer and finer partitions of your data.

gabonzo
Quartz | Level 8
Ok, it means they are nested. Good to know.

So I should change the order and use

BY PROV FED_NUM;

Thank you.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 777 views
  • 7 likes
  • 5 in conversation