Hi I have a large dataset consisting of millions of rows, I am trying to bring back the Ref minimum of post_date and minimum of from_date. what is the best way this can be done? See example in table below.
| Table_Test | |||
| Ref | Post_Date | From_Date | To_Date |
| 1 | 01/01/2017 | 27/12/2016 | 06/01/2017 |
| 1 | 01/01/2017 | 07/01/2017 | 17/01/2017 |
| 1 | 02/02/2017 | 18/01/2017 | 28/01/2017 |
| 2 | 01/02/2016 | 27/01/2016 | 01/02/2016 |
| 2 | 01/02/2016 | 02/02/2016 | 07/02/2016 |
| 2 | 03/04/2016 | 08/02/2016 | 13/02/2016 |
| 2 | 03/05/2016 | 14/02/2016 | 19/02/2016 |
| 3 | 10/04/2015 | 05/04/2015 | 15/04/2015 |
| 3 | 11/09/2015 | 06/09/2015 | 16/09/2015 |
| 4 | 04/02/2014 | 30/01/2014 | 09/02/2014 |
| 4 | 04/02/2016 | 10/02/2014 | 20/02/2014 |
| 4 | 01/01/2017 | 27/12/2016 | 06/01/2017 |
Method 1: SQL
proc sql;
create table want as
select ref, min(post_date) as min_post_date, min(from_date) as min_from_date
from have
group by ref
;
quit;
Method 2: data step
proc sort data=have;
by ref;
run;
/* omit this step if have is already sorted, as in your example */
data want (keep=ref min_post_date min_from_date);
set have;
by ref;
retain min_post_date min_from_date;
format min_post_date min_from_date ddmmyy10.;
if first.ref
then do;
min_post_date = '31dec9999'd;
min_form_date = '31dec9999'd;
end;
min_post_date = min(min_post_date,post_date);
min_from_date = min(min_from_date,from_date);
if last.ref then output;
run;
Method 3: proc means
proc means data=have noprint;
by ref;
var post_date from_date;
output
out=want (drop=_type_ _freq_)
min(post_date)=min_post_date
min(from_date)=min_from_date
;
run;
/* also needs sorting */
If the cardinality of ref is sufficiently low, you can use class instead of by in the proc means, and avoid sorting.
Method 1: SQL
proc sql;
create table want as
select ref, min(post_date) as min_post_date, min(from_date) as min_from_date
from have
group by ref
;
quit;
Method 2: data step
proc sort data=have;
by ref;
run;
/* omit this step if have is already sorted, as in your example */
data want (keep=ref min_post_date min_from_date);
set have;
by ref;
retain min_post_date min_from_date;
format min_post_date min_from_date ddmmyy10.;
if first.ref
then do;
min_post_date = '31dec9999'd;
min_form_date = '31dec9999'd;
end;
min_post_date = min(min_post_date,post_date);
min_from_date = min(min_from_date,from_date);
if last.ref then output;
run;
Method 3: proc means
proc means data=have noprint;
by ref;
var post_date from_date;
output
out=want (drop=_type_ _freq_)
min(post_date)=min_post_date
min(from_date)=min_from_date
;
run;
/* also needs sorting */
If the cardinality of ref is sufficiently low, you can use class instead of by in the proc means, and avoid sorting.
Thanks for your help Kurt, much appreciated
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.