Replicate PROC SQL for 200+categories - format library

rosegarden81 · Posted 10-16-2018 04:23 PM

Hi guys,

I am working on a simple PROC SQL command that creates distinct patient level data for each category. I have to do this step because the next macro is looking for a unique patient level data as one of its inputs.

My end goal:

Produce means of "amount paid" for each category.

This is how the original data (sourcedata) looks like:

Patient	Claims_ID	Description
1	12	A
1	13	A
1	15	Y
2	16	D
2	19	B
2	26	D

Issue here is:

I have to adjust with an existing macro that takes only unique patient level data - to create "amount paid" at patient level..So I have to first create individual category data sets (unique at patient level) from the master data. After this step, I can pass them onto the macro.

My approach:

%let cat = Diagnosis1 Diagnosis2 Diagnosis3.... Diagnosis200;

/* macro below simply creates distinct patient level data for each category - this is used to identify patients in that category */

%macro create1();

%do i = 1 %to &countcat.; /*countcat contains number of categories say 200 */

%let eachCat = %scan(&cat, &i); /* cat is a mcro variable containing each of the 200 category names separated by space */

PROC SQL;

Create table uniquePatient_&eachCat. /* creates dataset for each category */

select distinct patientID FROM

Sourcedata

where prxmacth("m,&eachCat./oi", category) > 0; /* this filters the data to that category */

quit;

%end;

%mend();

%create1(); /* invoke the macro */

Sourcedata has 50million observations which are at patientID and claims ID level.

This simple macro is taking 3.5 minutes to create dataset for each category so basically it will take 10 hours to complete this run!

I need distinct patient ID data that will be entered into another macro otherwise I would have thought of anything other than creating so many datasets as done above. I have heard that you can create format libraries that can speed up this process..any idea on how to go about it and if it may be applicable in this scenario - will be greatly appreciated!

Regards

sidpesar · Posted 10-16-2018 04:40 PM

You can do rsubmit which can process all 200 tables at same time which reduces time.
To do rsubmit you need to follow the below link

https://communities.sas.com/t5/SAS-Programming/Base-sas-connect-to-remote-windows-server/m-p/420967#...

once done you can use rsubmit for parallel processing

rosegarden81 · Posted 10-16-2018 05:02 PM

I work on a remote desktop which does not have internet access! Hence, I wouldnt be able to use an r submit I guess..

sidpesar · Posted 10-16-2018 05:07 PM

you can rusbmit on your own machine or remote machine. when you run on your machine sas opens instances and run your job(s) parallel and then we can send data to your current sas work folder

Replicate PROC SQL for 200+categories - format library

Re: Replicate PROC SQL for 200+categories - format library

Re: Replicate PROC SQL for 200+categories - format library

Re: Replicate PROC SQL for 200+categories - format library

Replicate PROC SQL for 200+categories - format library

Re: Replicate PROC SQL for 200+categories - format library

Re: Replicate PROC SQL for 200+categories - format library

Re: Replicate PROC SQL for 200+categories - format library

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!