BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SrikarA
Calcite | Level 5

Hello,

I am working with SAS Enterprise Guide and trying to extract a large amount of data through an ODBC connection. I'm looking for all claims above a certain threshold level. As such, my idea was to split the query into multiple steps to increase efficiency. During the first step, I accessed a claims table in the database, and pulled only members who had claims above the threshold level. Here's the code I wrote:

proc sql;

  connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxxxx);

  create table C.TableExample as

  select * from connection to odbc

  (

  Select

  Member,

  Account,

  Sum(Amount) as Paid

  FROM DATABASE.TABLE D

  WHERE D.Date between 'aaaa-bb-cc' and 'aaaa-bb-cc'

  and Account between '             0' and '99999999999999'

  and Sub_Type between 'X    ' and 'YYYYY '

  GROUP BY,

       Member,

       Account

  having sum(Amount) between 00000 and 9999999999

);

quit;

I was able to find a macro that a colleague created that creates one or more "in lists" for a text column. For example, if you had a table of 50 state codes, this could create 5 lists of 10 states each, etc..

Here's the macro code:

%macro inlist(table, column, variable, blocksize);

/* creates as many "in" lists as needed, for looping.

   For example, if the table X.ACCTS looks like:

  ACCT_NUM

  'ABCDEFG'

  'BCDEFGA'

  'CDEFGAB'

  'DEFGABC'

  'EFGABCD'

  'FGABCDE'

  'GABCDEF'

    Calling %inlist(X.ACCTS,ACCT_NUM,ACCT,2) would create the following global variables:

  ACCT1: "('ABCDEFG','BCDEFGA')

  ACCT2: "('CDEFGAB','DEFGABC')

  ACCT3: "('EFGABCD','FGABCDE')

  ACCT4: "('GABCDEF')

  ACCT_OBS: 4

So you could loop through a query four times to get all the accounts in your data set - obviously more useful when the

number is large.

*/

PROC SQL;

SELECT int(count(*)/ &blocksize )+1 INTO :varcount

FROM &table

;

DATA &TABLE;

SET &TABLE;

MACROW=_N_;

RUN;

%global &variable._obs;

%DO I = 1 %TO &VARCOUNT;

%global &variable._&I;

PROC SQL;

SELECT &column INTO :var SEPARATED BY ''','''

FROM &table

WHERE MACROW BETWEEN &BLOCKSIZE*&I-(&BLOCKSIZE-1) AND &BLOCKSIZE*&I

;

DATA _NULL;

    CALL SYMPUT ("&variable._&I","('"||"&var"||"')");

    CALL SYMPUT ("&variable._obs",&varcount);

RUN;

%END;

%mend;

I can then run my macro on C.TableExample as such:

%inlist(C.TableExample, Member, pid, 10000);

Finally, I use these lists to cycle through and pull claims information from the same table as I did the member information, through a proc sql statement once again. Here's the code:

%macro queryloop;

%do m = 1 %to &pid_obs;

Proc SQL;

connect to odbc(datasrc=xxxxx user=xxxx password=xxxxxxxxx);

%if &m = 1 %then create table C.Claims as

%else insert into C.Claims;

select * from connection to odbc

(

select

  Member,

  Account,

  Sub_Type,

  Sum(Amount) as P,

  A,

  B,

  C,

  D

FROM DATABASE.TABLE D

WHERE D.Year_Month between 'xxxxyy' and 'xxxxyy'

  and Account between '             0' and '99999999999999'

  and Sub_Type between 'X    ' and 'YYYYY '

group by

  Member,

  Account,

  Sub_Type,

  A,

  B,

  C,

  D

);

quit;

%end;

%mend;

%queryloop;

Apologize for the really long post. But, even with using a multi-step process, I will have to loop through hundreds of lists, and each list takes a few hours to complete the data pull. I am trying to understand the pass through SQL statement and see if there is a more efficient way of pulling a few million rows of data through an odbc connection. Would appreciate any help! Thank you.

Srikar

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Since no one else has suggestions, here's a couple of things.

Breaking things up by a list makes sense sometimes, I'm not sure it does in this instance, beside separating the accounts.

Somethings, can you partition the job on the server, ie can you write to the server rather than bring things back to SAS?

Do you have any indexes on the data set?

Are you really specifying character ranges for account and sub_type? Does it work any faster if you leave those out?

View solution in original post

2 REPLIES 2
Reeza
Super User

Since no one else has suggestions, here's a couple of things.

Breaking things up by a list makes sense sometimes, I'm not sure it does in this instance, beside separating the accounts.

Somethings, can you partition the job on the server, ie can you write to the server rather than bring things back to SAS?

Do you have any indexes on the data set?

Are you really specifying character ranges for account and sub_type? Does it work any faster if you leave those out?

SrikarA
Calcite | Level 5

Thanks Reeza!

I agree -- this is the first time I've had to pull this large of a data set, and I typically follow breaking things up into a list method, which works fine for decent sized lists.

Unfortunately, I am not able to partition the job on the server. Would need some approvals to do that from IT and it would be a big hassle.

The Member field was part of an index on the data set, but it seems to not help in this case that much due to the sheer size of the member list. However, I just found out that there is a clustered index on the data set that I can run off of. My next attempt is to loop through the variables in the clustered index and pull all claims for every member. Once I do that, I can limit to the list that I need, before having SAS store the result. I'm not sure if this is entirely going to help but I think it should work better than the previous attempt.

Again, I've gotten into the habit of specifying the character ranges, but I have no idea if that slows it down that much. I will take that away and try that as well!

Thanks again for your response.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 3797 views
  • 0 likes
  • 2 in conversation