BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ramakanthkrovi
Obsidian | Level 7

This is a dataset with 8 million records and I am trying to understand if this code is fine or if there is another way of achieving it (creating a max timestamp in a previous step and using a where clause) because it is taking ages to execute this (2 hours) and I wanted to check if it can be made to run quicker.

 

proc sql;
create table xyz as
select *
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

@Ramakanthkrovi - to get rid of the re-merging note your query would have to look like this:

 

proc sql;
create table xyz as
select aa_id
      ,ab_id
      ,max(date_timestamp) as date_timestamp_max
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;

View solution in original post

11 REPLIES 11
koyelghosh
Lapis Lazuli | Level 10

I have seen here that SAS experts recommend to use PROC SQL for smaller datasets. Is it possible for you to achieve the same goal using DATA step, intelligently? If it is a good idea to use PROC SQL for 8 million records then one of the experts will tell you. So if possible wait for their response or try with DATA step.

Best wishes

SASKiwi
PROC Star

Please post the log of your SQL query. I suspect it contains a SAS note about re-merging data given you haven't included any sum-type variables in your SELECT. If it is re-merging this will definitely slow your query. 

Ramakanthkrovi
Obsidian | Level 7

@SASKiwi yes, the log contains that note. how can I overcome it?

SASKiwi
PROC Star

@Ramakanthkrovi - all columns in your SELECT must also be repeated in your GROUP BY except for summary-type calculations like the MAX function you are using.

Ramakanthkrovi
Obsidian | Level 7

I am grouping by two columns already mentioned in the group by statement and I am selecting * from the dataset.

Reeza
Super User

@Ramakanthkrovi wrote:

I am grouping by two columns already mentioned in the group by statement and I am selecting * from the dataset.


Have you tried running it in stages or without having to see how long it takes? If the parts take about 2 hours individually, the fact that the main query takes 2 hours won't be surprising. If the parts complete in minutes then clearly there's something else that's the issue. 

 

SASKiwi
PROC Star

@Ramakanthkrovi - to get rid of the re-merging note your query would have to look like this:

 

proc sql;
create table xyz as
select aa_id
      ,ab_id
      ,max(date_timestamp) as date_timestamp_max
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;
andreas_lds
Jade | Level 19

You don't need proc sql at all:

proc sort data=sashelp.class out=work.class;
   by Sex descending Age;
run;

data work.want;
   set work.class;
   by Sex descending Age;

   length _Age 8;
   retain _Age;
   drop _Age;

   if first.Sex then do;
      _Age = Age;
   end;

   if Age = _Age;
run;
Reeza
Super User

How wide is that table? You must have another issue or forgot to submit quit. 

It used to take me 20 minutes to process 30 million rows with a lot of calculations on a desktop with 8GB of RAM. 

 


@Ramakanthkrovi wrote:

This is a dataset with 8 million records and I am trying to understand if this code is fine or if there is another way of achieving it (creating a max timestamp in a previous step and using a where clause) because it is taking ages to execute this (2 hours) and I wanted to check if it can be made to run quicker.

 

proc sql;
create table xyz as
select *
from abc
group by aa_id,ab_id
having date_timestamp = max(date_timestamp);
quit;

 

Thank you.


 

Ramakanthkrovi
Obsidian | Level 7

Hardware could be an issue. 

 

I am running on SAS EG on the cloud so I cannot pinpoint the exact problem. 

Reeza
Super User
The cloud isn't that different than on the server, so not sure how that affects anything.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3943 views
  • 3 likes
  • 5 in conversation