Hello,
I am working with a dataset that has a few million records, and instead of sorting by id and visit date, then using a data step to take the first. and last. visit dates for each id I would be interested in a more efficient way to get the data. The proc sort on the dataset takes forever. Any help would be appreciated. Thanks!
-Steve
I would be tempted to use PROC SUMMARY and CLASS statements.
Something like:
proc summary data =have ;
class ID;
var VisitDate;
output out=want max min /autoname;
run;
or
Proc sql;
create table want as
select id, min(visitdate) as firstdate, max(visitdate) as lastdate
from have
group by id;
quit;
I would be tempted to use PROC SUMMARY and CLASS statements.
Something like:
proc summary data =have ;
class ID;
var VisitDate;
output out=want max min /autoname;
run;
or
Proc sql;
create table want as
select id, min(visitdate) as firstdate, max(visitdate) as lastdate
from have
group by id;
quit;
If your source data is something other than SAS table, say Oracle, SQL server, then you will have an option doing it using pass-thru. Other than that, you are stuck with Proc Sort. I doubt if Hash table could help, but first you need to make sure your whole table can be fitted into your RAM, and even if it can, I suspect that the Hash sorting would be more efficient than Proc sort.
my 2cents,
Haikuo
Thanks so much ballardw, both of those are much, much quicker. I really appreciate it!
-Steve
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.