This is a dataset with 8 million records and I am trying to understand if this code is fine or if there is another way of achieving it (creating a max timestamp in a previous step and using a where clause) because it is taking ages to execute this (2 hours) and I wanted to check if it can be made to run quicker.
proc sql;
create table xyz as
select *
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;
Thank you.
@Ramakanthkrovi - to get rid of the re-merging note your query would have to look like this:
proc sql;
create table xyz as
select aa_id
,ab_id
,max(date_timestamp) as date_timestamp_max
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;
I have seen here that SAS experts recommend to use PROC SQL for smaller datasets. Is it possible for you to achieve the same goal using DATA step, intelligently? If it is a good idea to use PROC SQL for 8 million records then one of the experts will tell you. So if possible wait for their response or try with DATA step.
Best wishes
Please post the log of your SQL query. I suspect it contains a SAS note about re-merging data given you haven't included any sum-type variables in your SELECT. If it is re-merging this will definitely slow your query.
@SASKiwi yes, the log contains that note. how can I overcome it?
@Ramakanthkrovi - all columns in your SELECT must also be repeated in your GROUP BY except for summary-type calculations like the MAX function you are using.
I am grouping by two columns already mentioned in the group by statement and I am selecting * from the dataset.
@Ramakanthkrovi wrote:
I am grouping by two columns already mentioned in the group by statement and I am selecting * from the dataset.
Have you tried running it in stages or without having to see how long it takes? If the parts take about 2 hours individually, the fact that the main query takes 2 hours won't be surprising. If the parts complete in minutes then clearly there's something else that's the issue.
@Ramakanthkrovi - to get rid of the re-merging note your query would have to look like this:
proc sql;
create table xyz as
select aa_id
,ab_id
,max(date_timestamp) as date_timestamp_max
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;
You don't need proc sql at all:
proc sort data=sashelp.class out=work.class;
by Sex descending Age;
run;
data work.want;
set work.class;
by Sex descending Age;
length _Age 8;
retain _Age;
drop _Age;
if first.Sex then do;
_Age = Age;
end;
if Age = _Age;
run;
How wide is that table? You must have another issue or forgot to submit quit.
It used to take me 20 minutes to process 30 million rows with a lot of calculations on a desktop with 8GB of RAM.
@Ramakanthkrovi wrote:
This is a dataset with 8 million records and I am trying to understand if this code is fine or if there is another way of achieving it (creating a max timestamp in a previous step and using a where clause) because it is taking ages to execute this (2 hours) and I wanted to check if it can be made to run quicker.
proc sql;
create table xyz as
select *
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;
Thank you.
Hardware could be an issue.
I am running on SAS EG on the cloud so I cannot pinpoint the exact problem.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.