BookmarkSubscribeRSS Feed
Loko
Barite | Level 11

Dear all,

For the following sample dataset I need to retrieve the last value of c within each group (variable a) from the latest b.

data have;

input a $1-4 b c;

datalines;

M31 1 2

M31 3 9

M31 2 15

M98 5 4

M98 3 6

M98 4 2

;

run;

Therefore the output shall be M31 3 9 and M98 5 4.

I know the classic way of sort and last but I am curious if there is any other method whivh may run faster (my dataset is big...)

10x

7 REPLIES 7
stat_sas
Ammonite | Level 13

proc sql;

select * from have

group by a

having b=max(b);

quit;

Steelers_In_DC
Barite | Level 11

That's the only other way I would think to do it.  I'm curious to hear how the performance changed from sort / last.

Reeza
Super User

I think there's a proc summary solution here but I've never been good at optimization problems.

Reeza
Super User

proc summary data=have noprint;

class a;

id c;

var b;

output out=want max=;

run;

proc print data=want;

run;

Jagadishkatam
Amethyst | Level 16

I agree with you Mark, sort followed by last. could also be a good choice.

data have;

input a $1-4 b c;

datalines;

M31 1 2

M31 3 9

M31 2 15

M98 5 4

M98 3 6

M98 4 2

;

run;

proc sort data=have;

by a b;

run;

data want;

set have;

by a b;

if last.a;

run;

Thanks,

Jag

Thanks,
Jag
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, you don't need to sort the data first to get last.  Proc SQL obviously is quite good, however you may run into memory limits depending on the size of your data.  I would suggest adding in a last flag e.g:

data your_data;

     set your_data;

     by a;

     retain max_num max_num_res;

     if first.a then do;

          max_num=b;

          max_res=c;

     end;

     else do;

          if b > max_num then do;

               max_num=b;

               max_num_res=c;

          end;

     end;

     if last.a then output;

run;

Then you should have one row per group, with max_num and max_num_res holding the data you want.  You will have to test performance of course against SQL and Hash.

Haikuo
Onyx | Level 15

Some level of SORT will always be involved in this case, including SQL,Hash (implicit) or even the one offered by (explicit) Here is one example using Hash:

data have;

     input a $1-4 b c;

     datalines;

M31 1 2

M31 3 9

M31 2 15

M98 5 4

M98 3 6

M98 4 2

;

run;

data want;

     declare hash h(ordered:'a');

     h.definekey('b');

     h.definedata('b','c');

     h.definedone();

     declare hiter hi('h');

     do until (last.a);

           set have;

           by a notsorted;

           rc=h.add();

     end;

     rc=hi.last();

     output;

     drop rc;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 20148 views
  • 9 likes
  • 7 in conversation