BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lmtamina
Obsidian | Level 7

Good Morning,

 

I am working on pulling data for our vendor and it requires me to retrieve claims data  from a very huge dataset. I am pulling claims data from a dataset with 2.5B observations.  I am retrieving all claims records associated with a set of unique claims_case_numbers with approx. 11.5M observations. The resulting dataset is about 52M obs with 47 variables and runtime is 1.5 hrs. I would like to know if my coding approach  below  utilizing Hash object is already efficient for this task or if it can still be improved. Appreciate any suggestions.

Also, any recommendations on how to get the dataset results with  approx 13GB in size to our vendor . My initial thought is exporting in csv with zip. Thank you as always.

 

data vendor_claims;

length clm_case_key 8.;

 

if _N_ =1 then do;

DECLARE HASH H(dataset:'CLMS_CASES_LKUP');

H.DEFINEKEY ('clm_case_key');

 H.DEFINEDONE();

end;

 

SET CLAIMS_MASTER (keep=&ON_FILE_HEADER where=((serv_from_date ge '1Jan2016'd AND serv_from_Date le '31Dec2017'd)

and prod_cd not in ('PROD_COM','PROD_MDR')

and payment_cd = 'C'));

if H.find() = 0 then output;

run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

> If both your source tables are in Oracle then definitely use SQL as this will push all the processing to Oracle. 

 

Yes this. Keep all the subsetting in the source system and only transfer the result. Any other way is wasteful.

 

As for transferring, ask your vendor what they expect.

 

 

 

 

View solution in original post

6 REPLIES 6
AlanC
Barite | Level 11

A SAS dataset is ODBC compliant. If they are willing to install the reader, I would leave it in sas7bdat format. If not, CSV is probably as good as it gets but I would use tabs vs commas as delimiter.

 

Try doing it with SQL vs hash. SQL will optimize your code.

 

You would be best to send this to 1 of 2 other places:

 

- SAS consulting forum on Google (declining size but really good SAS people)

- SAS-L. Venerable, best coders including Paul Dorfman who is the expert on hash object. Roger D will also weigh in who is damn good on volume issues.

 

 

https://github.com/savian-net
lmtamina
Obsidian | Level 7
Spoiler
 

Thank you . I will try to  post to the other forums.

 

LinusH
Tourmaline | Level 20
Just for my understanding, where is your two original data sets stored (DBMS, local/remote) and where do you need to store/analyze the result?
Data never sleeps
lmtamina
Obsidian | Level 7
The results will be sent to our vendor via FTP site. And I am accessing the 2 database from DBMS(Oracle) and via client setup. We are using SAS enterprise guide. Hope this is helpful.
Patrick
Opal | Level 21

@lmtamina

If both your source tables are in Oracle then definitely use SQL as this will push all the processing to Oracle. If you want to further speed up the process then use explicit pass-through SQL (=formulated in the Oracle SQL flavour) with Oracle hints to parallelize processing.

...But: I guess the bottleneck is transferring the data from Oracle to SAS over the network so once you rewrite your code so that the data volume gets already reduced on the Oracle side end-to-end elapsed time should become quite a bit shorter.

 

Once you've got your data in SAS: If your vendor uses SAS as well then I'd first try and zip the SAS dataset. If that makes it small enough for an FTP transfer then good, else I'd split it up into chunks and zip these chunks in separate archives.

ChrisNZ
Tourmaline | Level 20

> If both your source tables are in Oracle then definitely use SQL as this will push all the processing to Oracle. 

 

Yes this. Keep all the subsetting in the source system and only transfer the result. Any other way is wasteful.

 

As for transferring, ask your vendor what they expect.

 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 587 views
  • 1 like
  • 5 in conversation