Hello, I am working with SAS Enterprise Guide and trying to extract a large amount of data through an ODBC connection. I'm looking for all claims above a certain threshold level. As such, my idea was to split the query into multiple steps to increase efficiency. During the first step, I accessed a claims table in the database, and pulled only members who had claims above the threshold level. Here's the code I wrote:
proc sql;
connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxxxx);
create table C.TableExample as
select * from connection to odbc
(
Select
Member,
Account,
Sum(Amount) as Paid
FROM DATABASE.TABLE D
WHERE D.Date between 'aaaa-bb-cc' and 'aaaa-bb-cc'
and Account between ' 0' and '99999999999999'
and Sub_Type between 'X ' and 'YYYYY '
GROUP BY,
Member,
Account
having sum(Amount) between 00000 and 9999999999
);
quit;
I was able to find a macro that a colleague created that creates one or more "in lists" for a text column. For example, if you had a table of 50 state codes, this could create 5 lists of 10 states each, etc.. Here's the macro code:
%macro inlist(table, column, variable, blocksize);
/* creates as many "in" lists as needed, for looping.
For example, if the table X.ACCTS looks like:
ACCT_NUM
'ABCDEFG'
'BCDEFGA'
'CDEFGAB'
'DEFGABC'
'EFGABCD'
'FGABCDE'
'GABCDEF'
Calling %inlist(X.ACCTS,ACCT_NUM,ACCT,2) would create the following global variables:
ACCT1: "('ABCDEFG','BCDEFGA')
ACCT2: "('CDEFGAB','DEFGABC')
ACCT3: "('EFGABCD','FGABCDE')
ACCT4: "('GABCDEF')
ACCT_OBS: 4
So you could loop through a query four times to get all the accounts in your data set - obviously more useful when the
number is large.
*/
PROC SQL;
SELECT int(count(*)/ &blocksize )+1 INTO :varcount
FROM &table
;
DATA &TABLE;
SET &TABLE;
MACROW=_N_;
RUN;
%global &variable._obs;
%DO I = 1 %TO &VARCOUNT;
%global &variable._&I;
PROC SQL;
SELECT &column INTO :var SEPARATED BY ''','''
FROM &table
WHERE MACROW BETWEEN &BLOCKSIZE*&I-(&BLOCKSIZE-1) AND &BLOCKSIZE*&I
;
DATA _NULL;
CALL SYMPUT ("&variable._&I","('"||"&var"||"')");
CALL SYMPUT ("&variable._obs",&varcount);
RUN;
%END;
%mend;
I can then run my macro on C.TableExample as such:
%inlist(C.TableExample, Member, pid, 10000);
Finally, I use these lists to cycle through and pull claims information from the same table as I did the member information, through a proc sql statement once again. Here's the code:
%macro queryloop;
%do m = 1 %to &pid_obs;
Proc SQL;
connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxx);
%if &m = 1 %then create table C.Claims as
%else insert into C.Claims;
select * from connection to odbc
(
select
Member,
Account,
Sub_Type,
Sum(Amount) as P,
A,
B,
C,
D
FROM DATABASE.TABLE D
WHERE D.Year_Month between 'xxxxyy' and 'xxxxyy'
and Account between ' 0' and '99999999999999'
and Sub_Type between 'X ' and 'YYYYY '
group by
Member,
Account,
Sub_Type,
A,
B,
C,
D
);
quit;
%end;
%mend;
%queryloop;
Apologize for the really long post. But, even with using a multi-step process, I will have to loop through hundreds of lists, and each list takes a few hours to complete the data pull. I am trying to understand the pass through SQL statement and see if there is a more efficient way of pulling a few million rows of data through an odbc connection. Would appreciate any help! Thank you. Srikar
... View more