Try this:
proc sort
  data=mydataset (
    keep=id var1 var2 var3 var4 time_flag
  )
  out=sorted
;
by id var1 var2 time_flag;
run;
data spine;
set sorted;
by id var1 var2;
if last.var2;
run;SQL will also sort, and it is usually less efficient than PROC SORT.
Try this:
proc sort
  data=mydataset (
    keep=id var1 var2 var3 var4 time_flag
  )
  out=sorted
;
by id var1 var2 time_flag;
run;
data spine;
set sorted;
by id var1 var2;
if last.var2;
run;SQL will also sort, and it is usually less efficient than PROC SORT.
@Callam1 wrote:
So a simple sort with sas is actually more efficient when using a big dataset?
See Maxim 4. Try It.
Did you try PROC SUMMARY?
Use your primary keys as the CLASS (or if already sorted BY) variables.
Use the IDGROUP and the MAX() type to select the observation with the maximum value of your time/ordering variable. And list the other variables you want copied from that maximum observation.
proc summary nway data=mydataset ;
  class id var1 var2 ;
  output out=spine idgroup (max(time_flag) out (var3 var4 time_flag)=);
run;You might want to use the DROP= dataset option on the output dataset reference to remove the _TYPE_ _FREQ_ variables that PROC SUMMARY adds.
  output out=spine(drop=_type_ _freq_) idgroup (max(time_flag) out (var3 var4 time_flag)=);Example:
data have;
  input id var1 var2 flag ;
cards;
1 1 1 1
1 1 2 2
1 1 3 3
1 2 1 1
1 3 1 1
;
proc print;
run;
proc summary data=have nway;
 class id var1 ;
 output out=want idgroup (max(var2) out (var2 flag)=) ;
run;
proc print;
run;
What works faster will depend on your data and you can only prove which is faster by testing each method. Take a subset of your data and try it for yourself.
@Callam1 wrote:
Thank you. Is the proc summary quicker than proc sort?
A likely issue with Proc Summary is getting the values for "other columns" as you said. Every variable would have to be referenced somewhere in the Proc Summary and might be a headache.
@ballardw wrote:
@Callam1 wrote:
Thank you. Is the proc summary quicker than proc sort?A likely issue with Proc Summary is getting the values for "other columns" as you said. Every variable would have to be referenced somewhere in the Proc Summary and might be a headache.
You can use _ALL_ if you don't mind the warning messages.
751 752 proc summary data=have nway; 753 class id var1 ; 754 output out=want idgroup (max(var2) out (_all_)=) ; 755 run; WARNING: Variable id already exists on file WORK.WANT. WARNING: Variable var1 already exists on file WORK.WANT. WARNING: The duplicate variables will not be included in the output data set of the output statement number 1.
The PROC SUMMARY offers a neat compact single-pass solution. It will certainly use a lot less disk input/output resources than the PROC SORT solution, and will probably be a lot faster - assuming there is no memory constraint.
BUT ... does your data have the possibility of tied maximum time_flag values for a given ID/VAR1/VAR2?
If not, then ignore the rest of this comment.
But if it does, then the PROC SUMMARY might not likely give the same result as the PROC SORT ... if LAST.VAR2 solution. It will choose different records (with possibly different VAR3/VAR4 values) among the tied records.
This is because the default behavior of PROC SORT is to preserve the original order (from the unsorted dataset) of tied records. So that solution would always choose the latest of the tied records.
A quick test of PROC SUMMARY with ties suggests it would always choose the first of such ties. At least it did so in the test below:
data ties;
  set sashelp.class (keep=name sex age weight);
  order='First'; output;
  order='Last' ; output;
run;
proc summary nway data=ties;
  class sex age;
  output out=summ_want (drop=_:) idgroup (max(weight) out (name weight order)=);
run;
proc sort data=ties;
  by sex age weight;
run;
data sort_want;
  set ties;
  by sex age;
  if last.age;
run;Dataset summ_want has order='First' in every output record, but the PROC SORT approach always has ORDER='Last'.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
