Hello,
I am working with SAS Enterprise Guide and trying to extract a large amount of data through an ODBC connection. I'm looking for all claims above a certain threshold level. As such, my idea was to split the query into multiple steps to increase efficiency. During the first step, I accessed a claims table in the database, and pulled only members who had claims above the threshold level. Here's the code I wrote:
proc sql;
connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxxxx);
create table C.TableExample as
select * from connection to odbc
(
Select
Member,
Account,
Sum(Amount) as Paid
FROM DATABASE.TABLE D
WHERE D.Date between 'aaaa-bb-cc' and 'aaaa-bb-cc'
and Account between ' 0' and '99999999999999'
and Sub_Type between 'X ' and 'YYYYY '
GROUP BY,
Member,
Account
having sum(Amount) between 00000 and 9999999999
);
quit;
I was able to find a macro that a colleague created that creates one or more "in lists" for a text column. For example, if you had a table of 50 state codes, this could create 5 lists of 10 states each, etc..
Here's the macro code:
%macro inlist(table, column, variable, blocksize);
/* creates as many "in" lists as needed, for looping.
For example, if the table X.ACCTS looks like:
ACCT_NUM
'ABCDEFG'
'BCDEFGA'
'CDEFGAB'
'DEFGABC'
'EFGABCD'
'FGABCDE'
'GABCDEF'
Calling %inlist(X.ACCTS,ACCT_NUM,ACCT,2) would create the following global variables:
ACCT1: "('ABCDEFG','BCDEFGA')
ACCT2: "('CDEFGAB','DEFGABC')
ACCT3: "('EFGABCD','FGABCDE')
ACCT4: "('GABCDEF')
ACCT_OBS: 4
So you could loop through a query four times to get all the accounts in your data set - obviously more useful when the
number is large.
*/
PROC SQL;
SELECT int(count(*)/ &blocksize )+1 INTO :varcount
FROM &table
;
DATA &TABLE;
SET &TABLE;
MACROW=_N_;
RUN;
%global &variable._obs;
%DO I = 1 %TO &VARCOUNT;
%global &variable._&I;
PROC SQL;
SELECT &column INTO :var SEPARATED BY ''','''
FROM &table
WHERE MACROW BETWEEN &BLOCKSIZE*&I-(&BLOCKSIZE-1) AND &BLOCKSIZE*&I
;
DATA _NULL;
CALL SYMPUT ("&variable._&I","('"||"&var"||"')");
CALL SYMPUT ("&variable._obs",&varcount);
RUN;
%END;
%mend;
I can then run my macro on C.TableExample as such:
%inlist(C.TableExample, Member, pid, 10000);
Finally, I use these lists to cycle through and pull claims information from the same table as I did the member information, through a proc sql statement once again. Here's the code:
%macro queryloop;
%do m = 1 %to &pid_obs;
Proc SQL;
connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxx);
%if &m = 1 %then create table C.Claims as
%else insert into C.Claims;
select * from connection to odbc
(
select
Member,
Account,
Sub_Type,
Sum(Amount) as P,
A,
B,
C,
D
FROM DATABASE.TABLE D
WHERE D.Year_Month between 'xxxxyy' and 'xxxxyy'
and Account between ' 0' and '99999999999999'
and Sub_Type between 'X ' and 'YYYYY '
group by
Member,
Account,
Sub_Type,
A,
B,
C,
D
);
quit;
%end;
%mend;
%queryloop;
Apologize for the really long post. But, even with using a multi-step process, I will have to loop through hundreds of lists, and each list takes a few hours to complete the data pull. I am trying to understand the pass through SQL statement and see if there is a more efficient way of pulling a few million rows of data through an odbc connection. Would appreciate any help! Thank you.
Srikar
Since no one else has suggestions, here's a couple of things.
Breaking things up by a list makes sense sometimes, I'm not sure it does in this instance, beside separating the accounts.
Somethings, can you partition the job on the server, ie can you write to the server rather than bring things back to SAS?
Do you have any indexes on the data set?
Are you really specifying character ranges for account and sub_type? Does it work any faster if you leave those out?
Since no one else has suggestions, here's a couple of things.
Breaking things up by a list makes sense sometimes, I'm not sure it does in this instance, beside separating the accounts.
Somethings, can you partition the job on the server, ie can you write to the server rather than bring things back to SAS?
Do you have any indexes on the data set?
Are you really specifying character ranges for account and sub_type? Does it work any faster if you leave those out?
Thanks Reeza!
I agree -- this is the first time I've had to pull this large of a data set, and I typically follow breaking things up into a list method, which works fine for decent sized lists.
Unfortunately, I am not able to partition the job on the server. Would need some approvals to do that from IT and it would be a big hassle.
The Member field was part of an index on the data set, but it seems to not help in this case that much due to the sheer size of the member list. However, I just found out that there is a clustered index on the data set that I can run off of. My next attempt is to loop through the variables in the clustered index and pull all claims for every member. Once I do that, I can limit to the list that I need, before having SAS store the result. I'm not sure if this is entirely going to help but I think it should work better than the previous attempt.
Again, I've gotten into the habit of specifying the character ranges, but I have no idea if that slows it down that much. I will take that away and try that as well!
Thanks again for your response.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.