You could try something like done in below sample code that pulls data from the DB in yearly or monthly slices.
The code assumes that the SQL Server variable ADD_DT is of type DATE. If it's datetime then you would need to amend the code accordingly.
If your SQL is as simple as shared then I'd be using Proc Append and try if increasing the readbuff value increases performance. If using a SQL then potentially just using the default could perform best.
Below code creates monthly or yearly SAS tables because this allows to easily add more data (dates) at a later time or if something falls over to only re-run for the failed date periods.
%macro slices(source_tbl,start_dt, end_dt,interval=year,target_tbl=want,replace=YES);
%local start stop end lib tbl;
%let end =%sysfunc(putn("&end_dt"d,f16. -l));
%let start=&start_dt;
%let lib=%scan(work.&target_tbl,-2);
%do i=0 %to 1000;
%let stop =%sysfunc(intnx(&interval,"&start_dt"d,&i,e));
%let stop =%sysfunc(min(&stop,&end),date9.);
%let tbl=%scan(work.&target_tbl,-1);
%if %upcase(&interval)=YEAR %then %let tbl=&tbl._&interval._%sysfunc(putn("&start"d,year4. -l));
%else
%if %upcase(&interval)=MONTH %then %let tbl=&tbl._&interval._%sysfunc(putn("&start"d,yymmn6. -l));
%if not %sysfunc(exist(&lib..&tbl)) or %upcase(&replace)=YES %then
%do;
proc datasets lib=&lib nolist nowarn;
delete &tbl;
run;
append data=&source_tbl base=&lib..&tbl;
where ADD_DT between "&start"d and "&stop"d;
run;
quit;
%end;
%if %sysfunc(putn("&stop"d,f16. -l))=&end %then %goto exit;
%let start=%sysfunc(intnx(&interval,"&start_dt"d,%eval(&i+1),b),date9.);
%end;
%exit:;
%mend;
Hello
Your query, if it is as posted, would be pulling all the data earlier than the specified date.
Add more filter conditions and consider using the between operator if you want to pull data between two dates.
Have a look at the following link. It is about pulling last seven years data
How many rows and columns of data are you currently reading? To test if your program slowness is caused by a slow network connection or not then you could try this query which just does a row count:
PROC SQL;
CONNECT TO SQLSVR(DATABASE=DB, USER=MYID, PASSWORD=PASS READBUFF=1000);
CREATE TABLE RECORDPULL AS
SELECT * FROM CONNECTION TO SQLSVR
(SELECT count(*) as RowCount
FROM MYDATABASE
WHERE ADD_DT <= '2023-12-31');
QUIT;
If this query is way faster then you have a slow network connection. If not then the SQL Server database is poorly performing, and you should ask your database administrator why. Maybe there is no index on ADD_DT and if there was it could solve your performance issues without having to reinvent your query.
You could try something like done in below sample code that pulls data from the DB in yearly or monthly slices.
The code assumes that the SQL Server variable ADD_DT is of type DATE. If it's datetime then you would need to amend the code accordingly.
If your SQL is as simple as shared then I'd be using Proc Append and try if increasing the readbuff value increases performance. If using a SQL then potentially just using the default could perform best.
Below code creates monthly or yearly SAS tables because this allows to easily add more data (dates) at a later time or if something falls over to only re-run for the failed date periods.
%macro slices(source_tbl,start_dt, end_dt,interval=year,target_tbl=want,replace=YES);
%local start stop end lib tbl;
%let end =%sysfunc(putn("&end_dt"d,f16. -l));
%let start=&start_dt;
%let lib=%scan(work.&target_tbl,-2);
%do i=0 %to 1000;
%let stop =%sysfunc(intnx(&interval,"&start_dt"d,&i,e));
%let stop =%sysfunc(min(&stop,&end),date9.);
%let tbl=%scan(work.&target_tbl,-1);
%if %upcase(&interval)=YEAR %then %let tbl=&tbl._&interval._%sysfunc(putn("&start"d,year4. -l));
%else
%if %upcase(&interval)=MONTH %then %let tbl=&tbl._&interval._%sysfunc(putn("&start"d,yymmn6. -l));
%if not %sysfunc(exist(&lib..&tbl)) or %upcase(&replace)=YES %then
%do;
proc datasets lib=&lib nolist nowarn;
delete &tbl;
run;
append data=&source_tbl base=&lib..&tbl;
where ADD_DT between "&start"d and "&stop"d;
run;
quit;
%end;
%if %sysfunc(putn("&stop"d,f16. -l))=&end %then %goto exit;
%let start=%sysfunc(intnx(&interval,"&start_dt"d,%eval(&i+1),b),date9.);
%end;
%exit:;
%mend;
Pulling big chunks of data works best if the table is partitioned:
But if this is a recurring query, why not keep a local copy of the history data, and just pull the recent dates/year?
seems like a simple %do-loop job:
%macro loopOver(start,end,by=200);
%local s e;
%let s=%sysevalf("&start."d+0);
%let e=%sysevalf("&end."d+0);
%put _local_;
PROC SQL noexec; /* <----- Remember to remove NOEXEC option */
CONNECT TO SQLSVR(DATABASE=DB, USER=MYID, PASSWORD=PASS READBUFF=1000);
%do i=&s. %to &e. %by &by.;
CREATE TABLE RECORDPULL_%sysfunc(int(&i.),yymmddn8.) AS
SELECT * FROM CONNECTION TO SQLSVR
(SELECT *
FROM MYDATABASE
WHERE ADD_DT
between %str(%')%sysfunc(int(&i.),yymmdd10.)%str(%')
abd %str(%')%sysfunc(int(%sysevalf(&i.+&by.-1)),yymmdd10.)%str(%')
);
%end;
QUIT;
%mend loopOver;
options mprint;
%loopOver(1jan2016,31dec2023)
Remember to adjust the code. Both, the BY= parameter and NOEXEC option.
BTW. I consider writing something like:
SELECT * FROM CONNECTION TO SQLSVR
(SELECT *
a very very bad programming practice... (those "*" to be clear)
Bart
%macro loopOver(start,end,by=200);
%local s e;
%let s=%sysevalf("&start."d+0);
%let e=%sysevalf("&end."d+0);
%put _local_;
PROC SQL noexec; /* <----- Remember to remove NOEXEC option */
CONNECT TO SQLSVR(DATABASE=DB, USER=MYID, PASSWORD=PASS READBUFF=1000);
%do i=&e. %to &s. %by -&by.;
CREATE TABLE RECORDPULL_%sysfunc(int(&i.),yymmddn8.) AS
SELECT * FROM CONNECTION TO SQLSVR
(SELECT *
FROM MYDATABASE
WHERE ADD_DT
between %str(%')%sysfunc(int(&i.),yymmdd10.)%str(%')
abd %str(%')%sysfunc(max(%sysevalf(&i.-&by.+1),&s.),yymmdd10.)%str(%')
);
%end;
QUIT;
%mend loopOver;
options mprint;
%loopOver(1jan2016,31dec2023)
@cgates wrote:
Thanks! This works except the by 200 element goes past the end date for the last date. In your example I would need it to stop at 31dec2023, but it goes to 18mar2024, since that is 200 days after 01sep2023.
The %slices() macro I've posted earlier will stop at the selected end date.
data have;
format add_dt date9.;
do add_dt='15Feb2020'd to '10Jun2023'd;
output;
end;
run;
%slices(work.have,01Jan2020, 20May2023,target_tbl=want);
proc sql;
select max(ADD_DT) format=date9.
from want_year_2023
;
quit;
Hi @cgates
I'm surprised none of the replies commented on the READBUFF=1000 clause!
When I look at improving Database extraction queries, these are the things I look at:
Typically, increasing this value causes faster data transfer, but it will depend on how much memory you have allocated to your SAS session.
%put %sysfunc(getoption(memsize));
and whether or not you can increase it!?
I hope this provides you with preliminary pre-requisites that you can incorporate with the rest of the solutions and suggestions you have already received,
Ahmed
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.