Hello,
I am working with a dataset that has a few million records, and instead of sorting by id and visit date, then using a data step to take the first. and last. visit dates for each id I would be interested in a more efficient way to get the data. The proc sort on the dataset takes forever. Any help would be appreciated. Thanks!
-Steve
I would be tempted to use PROC SUMMARY and CLASS statements.
Something like:
proc summary data =have ;
class ID;
var VisitDate;
output out=want max min /autoname;
run;
or
Proc sql;
create table want as
select id, min(visitdate) as firstdate, max(visitdate) as lastdate
from have
group by id;
quit;
I would be tempted to use PROC SUMMARY and CLASS statements.
Something like:
proc summary data =have ;
class ID;
var VisitDate;
output out=want max min /autoname;
run;
or
Proc sql;
create table want as
select id, min(visitdate) as firstdate, max(visitdate) as lastdate
from have
group by id;
quit;
If your source data is something other than SAS table, say Oracle, SQL server, then you will have an option doing it using pass-thru. Other than that, you are stuck with Proc Sort. I doubt if Hash table could help, but first you need to make sure your whole table can be fitted into your RAM, and even if it can, I suspect that the Hash sorting would be more efficient than Proc sort.
my 2cents,
Haikuo
Thanks so much ballardw, both of those are much, much quicker. I really appreciate it!
-Steve
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.