Hi I have a large dataset consisting of millions of rows, I am trying to bring back the Ref minimum of post_date and minimum of from_date. what is the best way this can be done? See example in table below.
Table_Test | |||
Ref | Post_Date | From_Date | To_Date |
1 | 01/01/2017 | 27/12/2016 | 06/01/2017 |
1 | 01/01/2017 | 07/01/2017 | 17/01/2017 |
1 | 02/02/2017 | 18/01/2017 | 28/01/2017 |
2 | 01/02/2016 | 27/01/2016 | 01/02/2016 |
2 | 01/02/2016 | 02/02/2016 | 07/02/2016 |
2 | 03/04/2016 | 08/02/2016 | 13/02/2016 |
2 | 03/05/2016 | 14/02/2016 | 19/02/2016 |
3 | 10/04/2015 | 05/04/2015 | 15/04/2015 |
3 | 11/09/2015 | 06/09/2015 | 16/09/2015 |
4 | 04/02/2014 | 30/01/2014 | 09/02/2014 |
4 | 04/02/2016 | 10/02/2014 | 20/02/2014 |
4 | 01/01/2017 | 27/12/2016 | 06/01/2017 |
Method 1: SQL
proc sql;
create table want as
select ref, min(post_date) as min_post_date, min(from_date) as min_from_date
from have
group by ref
;
quit;
Method 2: data step
proc sort data=have;
by ref;
run;
/* omit this step if have is already sorted, as in your example */
data want (keep=ref min_post_date min_from_date);
set have;
by ref;
retain min_post_date min_from_date;
format min_post_date min_from_date ddmmyy10.;
if first.ref
then do;
min_post_date = '31dec9999'd;
min_form_date = '31dec9999'd;
end;
min_post_date = min(min_post_date,post_date);
min_from_date = min(min_from_date,from_date);
if last.ref then output;
run;
Method 3: proc means
proc means data=have noprint;
by ref;
var post_date from_date;
output
out=want (drop=_type_ _freq_)
min(post_date)=min_post_date
min(from_date)=min_from_date
;
run;
/* also needs sorting */
If the cardinality of ref is sufficiently low, you can use class instead of by in the proc means, and avoid sorting.
Method 1: SQL
proc sql;
create table want as
select ref, min(post_date) as min_post_date, min(from_date) as min_from_date
from have
group by ref
;
quit;
Method 2: data step
proc sort data=have;
by ref;
run;
/* omit this step if have is already sorted, as in your example */
data want (keep=ref min_post_date min_from_date);
set have;
by ref;
retain min_post_date min_from_date;
format min_post_date min_from_date ddmmyy10.;
if first.ref
then do;
min_post_date = '31dec9999'd;
min_form_date = '31dec9999'd;
end;
min_post_date = min(min_post_date,post_date);
min_from_date = min(min_from_date,from_date);
if last.ref then output;
run;
Method 3: proc means
proc means data=have noprint;
by ref;
var post_date from_date;
output
out=want (drop=_type_ _freq_)
min(post_date)=min_post_date
min(from_date)=min_from_date
;
run;
/* also needs sorting */
If the cardinality of ref is sufficiently low, you can use class instead of by in the proc means, and avoid sorting.
Thanks for your help Kurt, much appreciated
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.