Hello Experts,
My data is very huge, it takes 5 hours to run the code. Do you know some tricks to avoid this please ?
My code is :
proc sql;
create table event2020 as
select *
from pkt.event
where year(datepart(d_compta)) >= 2019
and year(datepart(d_compta)) <= 2020;
quit;
Thank you very muche for your help !
Bugger! It uses the Meta engine so you would need to use SAS SMC and look in SAS metadata how the library is actually defined and where it points to.
You could execute below. That shouldn't take too much time. The log will then tell you if the actual table is in a database. If it's in a database you'll see in the log some trials where SAS tells you which SQL syntax it sent to the database to test if it can execute it, and then also a message telling you which SQL has been chosen to be sent in the end.
...and if it's a SAS table then you won't have any of these "strange" messages in your SAS log.
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
proc sql outobs=100;
create table test_2 as
select d_compta
from pkt.event
where '01jan2019 00:00:00'dt <= d_compta < '01jan2020 00:00:00'dt
;
quit;
Furthermore, it might not be the filtering that is causing execution time. Performance is about the whole chain of data on disk, movement to memory and CPU ussage.
Is your data in a Base SAS libname? What kind of storage?
Add these options to better understand your bottlenecks:
options fullstimer msglevel=i;
First thing that jumps to my mind: Is your source table a SAS table or in a data base? Where does libref pkt point to?
If it's a database then most likely the where condition doesn't get pushed to the database (because of the use of the datepart() function) but all the data gets first transferred over the network to SAS and then only filtered on the SAS side.
You could run below two queries to test the theory.
The 2nd query will push the where condition to the data base for execution (if your source table is in a database) so if the 2nd query is much faster then the required cause of action is clear.
IF the source table is in a database then the options I've added will show you in the SAS log which portion of the query gets pushed to the db for execution.
options fullstimer msglevel=i;
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
libname pkt list;
proc sql;
create table test_1 as
select d_compta
from pkt.event
where year(datepart(d_compta)) = 2019
;
quit;
proc sql;
create table test_2 as
select d_compta
from pkt.event
where '01jan2019 00:00:00'dt <= d_compta < '01jan2020 00:00:00'dt
;
quit;
Thank you, Ksharp ! The bufno=100 bufsize=128k is it maximum size please ?
Thank you @Ksharp ! 🙂
> From his test, it looks like "bufno=100 bufsize=128k " is the best configure for running speed. But you could ask @ChrisNZ
This result only works for the test data used in the book, though it gives a good starting point. Each data is different and if performance is important on a recurring basis for a given query, the best values for options engine, compression, buffer size, buffer number and OS cache should be benchmarked.
So you're 100% sure that your source table is already a SAS table? If you execute...
libname pkt list;
...does that show you a pathname as the location in the SAS log or something else?
Bugger! It uses the Meta engine so you would need to use SAS SMC and look in SAS metadata how the library is actually defined and where it points to.
You could execute below. That shouldn't take too much time. The log will then tell you if the actual table is in a database. If it's in a database you'll see in the log some trials where SAS tells you which SQL syntax it sent to the database to test if it can execute it, and then also a message telling you which SQL has been chosen to be sent in the end.
...and if it's a SAS table then you won't have any of these "strange" messages in your SAS log.
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
proc sql outobs=100;
create table test_2 as
select d_compta
from pkt.event
where '01jan2019 00:00:00'dt <= d_compta < '01jan2020 00:00:00'dt
;
quit;
@Ksharp That's exactly where I was heading to. The query @SASdevAnneMarie just executed together with the options would a) have pushed the where clause to the db and b) the SAS log would have told us that this is happening. But from what Marie tells us now it really looks like a SAS table.
@SASdevAnneMarie Is this SAS 9.4 or SAS Viya? And how many rows and columns does your source SAS table have? Like: Can you share the result of a Proc Contents? 5+ hours for such a simple query that doesn't require any sorting feels excessive.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.