Desktop productivity for business analysts and programmers

PROC SQL - connection to ODBC

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

PROC SQL - connection to ODBC

Hello,

I am working with SAS Enterprise Guide and trying to extract a large amount of data through an ODBC connection. I'm looking for all claims above a certain threshold level. As such, my idea was to split the query into multiple steps to increase efficiency. During the first step, I accessed a claims table in the database, and pulled only members who had claims above the threshold level. Here's the code I wrote:

proc sql;

  connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxxxx);

  create table C.TableExample as

  select * from connection to odbc

  (

  Select

  Member,

  Account,

  Sum(Amount) as Paid

  FROM DATABASE.TABLE D

  WHERE D.Date between 'aaaa-bb-cc' and 'aaaa-bb-cc'

  and Account between '             0' and '99999999999999'

  and Sub_Type between 'X    ' and 'YYYYY '

  GROUP BY,

       Member,

       Account

  having sum(Amount) between 00000 and 9999999999

);

quit;

I was able to find a macro that a colleague created that creates one or more "in lists" for a text column. For example, if you had a table of 50 state codes, this could create 5 lists of 10 states each, etc..

Here's the macro code:

%macro inlist(table, column, variable, blocksize);

/* creates as many "in" lists as needed, for looping.

   For example, if the table X.ACCTS looks like:

  ACCT_NUM

  'ABCDEFG'

  'BCDEFGA'

  'CDEFGAB'

  'DEFGABC'

  'EFGABCD'

  'FGABCDE'

  'GABCDEF'

    Calling %inlist(X.ACCTS,ACCT_NUM,ACCT,2) would create the following global variables:

  ACCT1: "('ABCDEFG','BCDEFGA')

  ACCT2: "('CDEFGAB','DEFGABC')

  ACCT3: "('EFGABCD','FGABCDE')

  ACCT4: "('GABCDEF')

  ACCT_OBS: 4

So you could loop through a query four times to get all the accounts in your data set - obviously more useful when the

number is large.

*/

PROC SQL;

SELECT int(count(*)/ &blocksize )+1 INTO :varcount

FROM &table

;

DATA &TABLE;

SET &TABLE;

MACROW=_N_;

RUN;

%global &variable._obs;

%DO I = 1 %TO &VARCOUNT;

%global &variable._&I;

PROC SQL;

SELECT &column INTO :var SEPARATED BY ''','''

FROM &table

WHERE MACROW BETWEEN &BLOCKSIZE*&I-(&BLOCKSIZE-1) AND &BLOCKSIZE*&I

;

DATA _NULL;

    CALL SYMPUT ("&variable._&I","('"||"&var"||"')");

    CALL SYMPUT ("&variable._obs",&varcount);

RUN;

%END;

%mend;

I can then run my macro on C.TableExample as such:

%inlist(C.TableExample, Member, pid, 10000);

Finally, I use these lists to cycle through and pull claims information from the same table as I did the member information, through a proc sql statement once again. Here's the code:

%macro queryloop;

%do m = 1 %to &pid_obs;

Proc SQL;

connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxx);

%if &m = 1 %then create table C.Claims as

%else insert into C.Claims;

select * from connection to odbc

(

select

  Member,

  Account,

  Sub_Type,

  Sum(Amount) as P,

  A,

  B,

  C,

  D

FROM DATABASE.TABLE D

WHERE D.Year_Month between 'xxxxyy' and 'xxxxyy'

  and Account between '             0' and '99999999999999'

  and Sub_Type between 'X    ' and 'YYYYY '

group by

  Member,

  Account,

  Sub_Type,

  A,

  B,

  C,

  D

);

quit;

%end;

%mend;

%queryloop;

Apologize for the really long post. But, even with using a multi-step process, I will have to loop through hundreds of lists, and each list takes a few hours to complete the data pull. I am trying to understand the pass through SQL statement and see if there is a more efficient way of pulling a few million rows of data through an odbc connection. Would appreciate any help! Thank you.

Srikar


Accepted Solutions
Solution
‎12-27-2012 04:42 PM
Grand Advisor
Posts: 17,297

Re: PROC SQL - connection to ODBC

Since no one else has suggestions, here's a couple of things.

Breaking things up by a list makes sense sometimes, I'm not sure it does in this instance, beside separating the accounts.

Somethings, can you partition the job on the server, ie can you write to the server rather than bring things back to SAS?

Do you have any indexes on the data set?

Are you really specifying character ranges for account and sub_type? Does it work any faster if you leave those out?

View solution in original post


All Replies
Solution
‎12-27-2012 04:42 PM
Grand Advisor
Posts: 17,297

Re: PROC SQL - connection to ODBC

Since no one else has suggestions, here's a couple of things.

Breaking things up by a list makes sense sometimes, I'm not sure it does in this instance, beside separating the accounts.

Somethings, can you partition the job on the server, ie can you write to the server rather than bring things back to SAS?

Do you have any indexes on the data set?

Are you really specifying character ranges for account and sub_type? Does it work any faster if you leave those out?

Occasional Contributor
Posts: 9

Re: PROC SQL - connection to ODBC

Thanks Reeza!

I agree -- this is the first time I've had to pull this large of a data set, and I typically follow breaking things up into a list method, which works fine for decent sized lists.

Unfortunately, I am not able to partition the job on the server. Would need some approvals to do that from IT and it would be a big hassle.

The Member field was part of an index on the data set, but it seems to not help in this case that much due to the sheer size of the member list. However, I just found out that there is a clustered index on the data set that I can run off of. My next attempt is to loop through the variables in the clustered index and pull all claims for every member. Once I do that, I can limit to the list that I need, before having SAS store the result. I'm not sure if this is entirely going to help but I think it should work better than the previous attempt.

Again, I've gotten into the habit of specifying the character ranges, but I have no idea if that slows it down that much. I will take that away and try that as well!

Thanks again for your response.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 1613 views
  • 0 likes
  • 2 in conversation