Hi I have a large dataset consisting of millions of rows, I am trying to bring back the Ref minimum of post_date and minimum of from_date. what is the best way this can be done? See example in table below.
Table_Test | |||
Ref | Post_Date | From_Date | To_Date |
1 | 01/01/2017 | 27/12/2016 | 06/01/2017 |
1 | 01/01/2017 | 07/01/2017 | 17/01/2017 |
1 | 02/02/2017 | 18/01/2017 | 28/01/2017 |
2 | 01/02/2016 | 27/01/2016 | 01/02/2016 |
2 | 01/02/2016 | 02/02/2016 | 07/02/2016 |
2 | 03/04/2016 | 08/02/2016 | 13/02/2016 |
2 | 03/05/2016 | 14/02/2016 | 19/02/2016 |
3 | 10/04/2015 | 05/04/2015 | 15/04/2015 |
3 | 11/09/2015 | 06/09/2015 | 16/09/2015 |
4 | 04/02/2014 | 30/01/2014 | 09/02/2014 |
4 | 04/02/2016 | 10/02/2014 | 20/02/2014 |
4 | 01/01/2017 | 27/12/2016 | 06/01/2017 |
Method 1: SQL
proc sql;
create table want as
select ref, min(post_date) as min_post_date, min(from_date) as min_from_date
from have
group by ref
;
quit;
Method 2: data step
proc sort data=have;
by ref;
run;
/* omit this step if have is already sorted, as in your example */
data want (keep=ref min_post_date min_from_date);
set have;
by ref;
retain min_post_date min_from_date;
format min_post_date min_from_date ddmmyy10.;
if first.ref
then do;
min_post_date = '31dec9999'd;
min_form_date = '31dec9999'd;
end;
min_post_date = min(min_post_date,post_date);
min_from_date = min(min_from_date,from_date);
if last.ref then output;
run;
Method 3: proc means
proc means data=have noprint;
by ref;
var post_date from_date;
output
out=want (drop=_type_ _freq_)
min(post_date)=min_post_date
min(from_date)=min_from_date
;
run;
/* also needs sorting */
If the cardinality of ref is sufficiently low, you can use class instead of by in the proc means, and avoid sorting.
Method 1: SQL
proc sql;
create table want as
select ref, min(post_date) as min_post_date, min(from_date) as min_from_date
from have
group by ref
;
quit;
Method 2: data step
proc sort data=have;
by ref;
run;
/* omit this step if have is already sorted, as in your example */
data want (keep=ref min_post_date min_from_date);
set have;
by ref;
retain min_post_date min_from_date;
format min_post_date min_from_date ddmmyy10.;
if first.ref
then do;
min_post_date = '31dec9999'd;
min_form_date = '31dec9999'd;
end;
min_post_date = min(min_post_date,post_date);
min_from_date = min(min_from_date,from_date);
if last.ref then output;
run;
Method 3: proc means
proc means data=have noprint;
by ref;
var post_date from_date;
output
out=want (drop=_type_ _freq_)
min(post_date)=min_post_date
min(from_date)=min_from_date
;
run;
/* also needs sorting */
If the cardinality of ref is sufficiently low, you can use class instead of by in the proc means, and avoid sorting.
Thanks for your help Kurt, much appreciated
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.