turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Combine multiple variables into a LAST.ab variable

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-28-2009 10:08 AM

If I have a dataset with variables a and b and I sort by a then b how do I filter for the last entry of b for each a. My assumption is that I would have some sort statement like

IF LAST.ab;

where ab is some sort of cartesian product of a and b. But I don't know how to do this. Any thoughts?

IF LAST.ab;

where ab is some sort of cartesian product of a and b. But I don't know how to do this. Any thoughts?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-28-2009 10:18 AM

A DATA step technique using a BY statement combined with IF LAST. AND LAST. would be the technique to use.

Search the SAS support http://support.sas.com/ website - using this Google advanced search argument:

sas data step first last processing site:sas.com

Scott Barry

SBBWorks, Inc.

Search the SAS support http://support.sas.com/ website - using this Google advanced search argument:

sas data step first last processing site:sas.com

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-28-2009 10:38 AM

Since you have sorted by a b then the last.a is the record with the highest value of b within a. So all you need is if last.a

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-29-2009 11:20 AM

Reading the original post and "...filter for the last entry of b for each a", I would say that the BY statement needs to reflect BY A B; and thus the IF test would be IF LAST.B to generate the desired filter logic.

Scott Barry

SBBWorks, Inc.

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-29-2009 12:55 PM

Scott;

That is a common misconception as to how last. works. Note the code below.

data one;

do a = 1 to 5;

do b = 1 to 5;

output;

end;

end;

run;

data two;

set one;

by a b;

la = last.a;

lb = last.b;

run;

proc print;

run;

The SAS System 12:42 Saturday, August 29, 2009 3

Obs a b la lb

1 1 1 0 1

2 1 2 0 1

3 1 3 0 1

4 1 4 0 1

5 1 5 1 1

6 2 1 0 1

7 2 2 0 1

8 2 3 0 1

9 2 4 0 1

10 2 5 1 1

11 3 1 0 1

12 3 2 0 1

13 3 3 0 1

14 3 4 0 1

15 3 5 1 1

16 4 1 0 1

17 4 2 0 1

18 4 3 0 1

19 4 4 0 1

20 4 5 1 1

21 5 1 0 1

22 5 2 0 1

23 5 3 0 1

24 5 4 0 1

25 5 5 1 1

Now note that since b is changing, last.b is always true. In fact last. is only meaningfull if there is a sort order below it. So since each group of A is sorted by B, the way to find the highest value of B within A is to take last.A.

Flip

That is a common misconception as to how last. works. Note the code below.

data one;

do a = 1 to 5;

do b = 1 to 5;

output;

end;

end;

run;

data two;

set one;

by a b;

la = last.a;

lb = last.b;

run;

proc print;

run;

The SAS System 12:42 Saturday, August 29, 2009 3

Obs a b la lb

1 1 1 0 1

2 1 2 0 1

3 1 3 0 1

4 1 4 0 1

5 1 5 1 1

6 2 1 0 1

7 2 2 0 1

8 2 3 0 1

9 2 4 0 1

10 2 5 1 1

11 3 1 0 1

12 3 2 0 1

13 3 3 0 1

14 3 4 0 1

15 3 5 1 1

16 4 1 0 1

17 4 2 0 1

18 4 3 0 1

19 4 4 0 1

20 4 5 1 1

21 5 1 0 1

22 5 2 0 1

23 5 3 0 1

24 5 4 0 1

25 5 5 1 1

Now note that since b is changing, last.b is always true. In fact last. is only meaningfull if there is a sort order below it. So since each group of A is sorted by B, the way to find the highest value of B within A is to take last.A.

Flip

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-29-2009 02:53 PM

As is the case with your data example, I made no assumption that there is only one unique B value observation within a give A "set", even after sorting. He wanted the last occurrence of B for each unique A value. So, there could be more than one observation with a given B value within A -- a condition that would need to be addressed with the IF LAST.B test.

Also, there may be usefulness with the additional NOSORTED parameter while using BY and FIRST. and/or LAST..

For some circumstances, I also find consecutive PROC SORT invocations, with the last using EQUALS NODUPKEY -- may or may not apply - just mentioning it for consideration (now or later).

Scott Barry

SBBWorks, Inc.

Also, there may be usefulness with the additional NOSORTED parameter while using BY and FIRST.

For some circumstances, I also find consecutive PROC SORT invocations, with the last using EQUALS NODUPKEY -- may or may not apply - just mentioning it for consideration (now or later).

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-29-2009 05:37 PM

Again not so;

data one;

do a = 1 to 2;

do b = 1 to 2;

do c = 1 to 2;

output;

end;

end;

end;

run;

data two;

set one;

by a b;

la = last.a;

lb = last.b;

run;

proc print;

run;

Obs a b c la lb

1 1 1 1 0 0

2 1 1 2 0 1

3 1 2 1 0 0

4 1 2 2 1 1

5 2 1 1 0 0

6 2 1 2 0 1

7 2 2 1 0 0

8 2 2 2 1 1

Note now Obs 3 and 4 have the same value of b. Again lats.a gives the correct result. Last.b gives the highest value of c within a given b value.

Last.x is the observation of group of x sorted on whatever is next in the by value.

Yes using NODUPKEY solves this, but it is still without question last.A which is correct.

data one;

do a = 1 to 2;

do b = 1 to 2;

do c = 1 to 2;

output;

end;

end;

end;

run;

data two;

set one;

by a b;

la = last.a;

lb = last.b;

run;

proc print;

run;

Obs a b c la lb

1 1 1 1 0 0

2 1 1 2 0 1

3 1 2 1 0 0

4 1 2 2 1 1

5 2 1 1 0 0

6 2 1 2 0 1

7 2 2 1 0 0

8 2 2 2 1 1

Note now Obs 3 and 4 have the same value of b. Again lats.a gives the correct result. Last.b gives the highest value of c within a given b value.

Last.x is the observation of group of x sorted on whatever is next in the by value.

Yes using NODUPKEY solves this, but it is still without question last.A which is correct.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-30-2009 12:35 PM

Let's take the DATA step below (using your "modified" WORK.ONE file, having multiple-value B variable instances) -- and given Flip's premise that only filtering on the condition is sufficient:

IF LAST.A;

which will generate two observations (each unique A value) as compared to the OP request for each unique B within each A condition.

Observe the SAS log below with the resulting SAS variables while executing the DATA step with the SET and BY A B; -- the LAST.B condition is clearly "TRUE" more frequently than the LAST.A condition is "TRUE".

So, I submit that if the objective is to yield each unique occurrence of both A and B, the code below will generate the combination values in WORK.TWO:

DATA TWO;

SET ONE;

BY A B;

IF LAST.B;

RUN;

From this example data representation, the accurate observation count in WORK.TWO should be 4, whereas having the IF statement:

IF LAST.A;

will only generate two (2) observations in SAS file WORK.TWO.

Scott Barry

SBBWorks, Inc.

11 data two;

12 set one;

13 by a b;

14 putlog _all_;

15 if last.b;

16 run;

a=1 b=1 c=1 FIRST.a=1 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=1

a=1 b=1 c=2 FIRST.a=0 LAST.a=0 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=2

a=1 b=2 c=1 FIRST.a=0 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=3

a=1 b=2 c=2 FIRST.a=0 LAST.a=1 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=4

a=2 b=1 c=1 FIRST.a=1 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=5

a=2 b=1 c=2 FIRST.a=0 LAST.a=0 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=6

a=2 b=2 c=1 FIRST.a=0 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=7

a=2 b=2 c=2 FIRST.a=0 LAST.a=1 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=8

NOTE: There were 8 observations read from the data set WORK.ONE.

NOTE: The data set WORK.TWO has 4 observations and 3 variables.

NOTE: DATA statement used (Total process time):

IF LAST.A;

which will generate two observations (each unique A value) as compared to the OP request for each unique B within each A condition.

Observe the SAS log below with the resulting SAS variables while executing the DATA step with the SET and BY A B; -- the LAST.B condition is clearly "TRUE" more frequently than the LAST.A condition is "TRUE".

So, I submit that if the objective is to yield each unique occurrence of both A and B, the code below will generate the combination values in WORK.TWO:

DATA TWO;

SET ONE;

BY A B;

IF LAST.B;

RUN;

From this example data representation, the accurate observation count in WORK.TWO should be 4, whereas having the IF statement:

IF LAST.A;

will only generate two (2) observations in SAS file WORK.TWO.

Scott Barry

SBBWorks, Inc.

11 data two;

12 set one;

13 by a b;

14 putlog _all_;

15 if last.b;

16 run;

a=1 b=1 c=1 FIRST.a=1 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=1

a=1 b=1 c=2 FIRST.a=0 LAST.a=0 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=2

a=1 b=2 c=1 FIRST.a=0 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=3

a=1 b=2 c=2 FIRST.a=0 LAST.a=1 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=4

a=2 b=1 c=1 FIRST.a=1 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=5

a=2 b=1 c=2 FIRST.a=0 LAST.a=0 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=6

a=2 b=2 c=1 FIRST.a=0 LAST.a=0 FIRST.b=1 LAST.b=0 _ERROR_=0 _N_=7

a=2 b=2 c=2 FIRST.a=0 LAST.a=1 FIRST.b=0 LAST.b=1 _ERROR_=0 _N_=8

NOTE: There were 8 observations read from the data set WORK.ONE.

NOTE: The data set WORK.TWO has 4 observations and 3 variables.

NOTE: DATA statement used (Total process time):

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-30-2009 04:12 PM

I think that you and I are reading the OP differently. He states "how do I filter for the last entry of b for each a" where I am taking this to mean the highest value of B within A, you are reading it as The last value of a given B within A based on some ordering which is not mentioned.

The point I am trying to make is demonstrated below. I purposefully here do not sort on C. As you see the observation getting last.b is not related to any sort order, and thus can be the 'wrong' observation.

As you often state, clarity in a request is of great importance in determining te correct responce.

data one;

do a = 1 to 2;

do b=1 to 2;

do c=1 to 2;

output;

end;

end;

end;

run;

proc sort data = one out = two;

by a b descending c;

run;

data one;

set one two;

run;

proc sort data = one out = two;

by a b ;

run;

data one;

set two;

by a b;

la = last.a;

lb = last.b;

run;

proc print;

run;

The SAS System 16:00 Sunday, August 30, 2009 1

Obs a b c la lb

1 1 1 1 0 0

2 1 1 2 0 0

3 1 1 2 0 0

4 1 1 1 0 1

5 1 2 1 0 0

6 1 2 2 0 0

7 1 2 2 0 0

8 1 2 1 1 1

9 2 1 1 0 0

10 2 1 2 0 0

11 2 1 2 0 0

12 2 1 1 0 1

13 2 2 1 0 0

14 2 2 2 0 0

15 2 2 2 0 0

16 2 2 1 1 1

The point I am trying to make is demonstrated below. I purposefully here do not sort on C. As you see the observation getting last.b is not related to any sort order, and thus can be the 'wrong' observation.

As you often state, clarity in a request is of great importance in determining te correct responce.

data one;

do a = 1 to 2;

do b=1 to 2;

do c=1 to 2;

output;

end;

end;

end;

run;

proc sort data = one out = two;

by a b descending c;

run;

data one;

set one two;

run;

proc sort data = one out = two;

by a b ;

run;

data one;

set two;

by a b;

la = last.a;

lb = last.b;

run;

proc print;

run;

The SAS System 16:00 Sunday, August 30, 2009 1

Obs a b c la lb

1 1 1 1 0 0

2 1 1 2 0 0

3 1 1 2 0 0

4 1 1 1 0 1

5 1 2 1 0 0

6 1 2 2 0 0

7 1 2 2 0 0

8 1 2 1 1 1

9 2 1 1 0 0

10 2 1 2 0 0

11 2 1 2 0 0

12 2 1 1 0 1

13 2 2 1 0 0

14 2 2 2 0 0

15 2 2 2 0 0

16 2 2 1 1 1

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-30-2009 04:59 PM

Humbly, let's observe the initial thread question. Here's the first sentence from the OP:

"If I have a dataset with variables a and b and I sort by a then b how do I filter for the last entry of b for each a. "

Of course, I presume that he wants the "last entry" of each B variable value captured for output (which would be the default, since he didn't mention DESCENDING B as a sort criteria specification), with each A variable value (also a contributing "break" variable by definition). The SAS variable B is mentioned as a contributing data-variable to the sort order.

Again, with your second data-set as a decent sample to demonstrate the importance of testing IF LAST.B; , you will either get:

- four output observations: one obs for each unique A variable value and unique B variable value.

- two observations: one for each unique A variable value - oh, by the way, the value of B will be whatever occurred on the last/highest observation of A.

Clearly, data dependent, but the lack of testing: IF LAST.B; could yield incomplete output data.

If you believe that there should only be 2 observations - but the important point here is the need to have a "break" test (for output/filter) selection that includes variable B as well as variable A. That is where we have a difference of technical opinion, I believe.

Scott Barry

SBBWorks, Inc.

"If I have a dataset with variables a and b and I sort by a then b how do I filter for the last entry of b for each a. "

Of course, I presume that he wants the "last entry" of each B variable value captured for output (which would be the default, since he didn't mention DESCENDING B as a sort criteria specification), with each A variable value (also a contributing "break" variable by definition). The SAS variable B is mentioned as a contributing data-variable to the sort order.

Again, with your second data-set as a decent sample to demonstrate the importance of testing IF LAST.B; , you will either get:

- four output observations: one obs for each unique A variable value and unique B variable value.

- two observations: one for each unique A variable value - oh, by the way, the value of B will be whatever occurred on the last/highest observation of A.

Clearly, data dependent, but the lack of testing: IF LAST.B; could yield incomplete output data.

If you believe that there should only be 2 observations - but the important point here is the need to have a "break" test (for output/filter) selection that includes variable B as well as variable A. That is where we have a difference of technical opinion, I believe.

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-31-2009 09:35 AM

Oh, just for the fun of discussion, a couple of points.

SBB: "- two observations: one for each unique A variable value - oh, by the way, the value of B will be whatever occurred on the last/highest observation of A."

Actually the value of B would be the maximum value of B within a given A. Last.A order is driven by the sort order of B within A. It will return any old value of C, which is my point.

OP: "If I have a dataset with variables a and b and I sort by a then b"

with only A and B in the data set. sorted by A and B with multiple equivalent values of B within a given A, the statements first.B, last.B, anyold.B would give identical records from A-B pairs.

The reason I stress this is that I have seen several programmers write code like this when trying to get the record with the maximum value of B within A and got the result you are talking about instead. And yes, I can see that the way you are reading the OP could be what he intended, I did not read it that way.

Thanks for the discussion.

SBB: "- two observations: one for each unique A variable value - oh, by the way, the value of B will be whatever occurred on the last/highest observation of A."

Actually the value of B would be the maximum value of B within a given A. Last.A order is driven by the sort order of B within A. It will return any old value of C, which is my point.

OP: "If I have a dataset with variables a and b and I sort by a then b"

with only A and B in the data set. sorted by A and B with multiple equivalent values of B within a given A, the statements first.B, last.B, anyold.B would give identical records from A-B pairs.

The reason I stress this is that I have seen several programmers write code like this when trying to get the record with the maximum value of B within A and got the result you are talking about instead. And yes, I can see that the way you are reading the OP could be what he intended, I did not read it that way.

Thanks for the discussion.