BookmarkSubscribeRSS Feed
rosegarden81
Obsidian | Level 7

Hi guys,

I am working on a simple PROC SQL command that creates distinct patient level data for each category. I have to do this step because the next macro is looking for a unique patient level data as one of its inputs. 

 

My end goal:

Produce means of "amount paid" for each category.

 

This is how the original data (sourcedata) looks like:

PatientClaims_IDDescription
112A
113A
115Y
216D
219B
226D

 

Issue here is:

I have to adjust with an existing macro that takes only unique patient level data - to create "amount paid" at patient level..So I have to first create individual category data sets (unique at patient level) from the master data. After this step, I can pass them onto the macro.

 

 

My approach:

%let cat = Diagnosis1 Diagnosis2 Diagnosis3.... Diagnosis200;

 

 

/* macro below simply creates distinct patient level data for each category - this is used to identify patients in that category */

%macro create1();

%do i = 1 %to &countcat.;                     /*countcat contains number of categories say 200 */

%let eachCat = %scan(&cat, &i);          /* cat is a mcro variable containing each of the 200 category names separated by space */

PROC SQL;

Create table uniquePatient_&eachCat.  /* creates dataset for each category */

select distinct patientID FROM 

Sourcedata

where prxmacth("m,&eachCat./oi", category) > 0; /* this filters the data to that category */

quit;

%end;

%mend();

%create1(); /* invoke the macro */

 

 

Sourcedata has 50million observations which are at patientID and claims ID level.

This simple macro is taking 3.5 minutes to create dataset for each category so basically it will take 10 hours to complete this run! 

 

I need distinct patient ID data that will be entered into another macro otherwise I would have thought of anything other than creating so many datasets as done above. I have heard that you can create format libraries that can speed up this process..any idea on how to go about it and if it may be applicable in this scenario - will be greatly appreciated!

 

Regards

3 REPLIES 3
sidpesar
Obsidian | Level 7
You can do rsubmit which can process all 200 tables at same time which reduces time.
To do rsubmit you need to follow the below link

https://communities.sas.com/t5/SAS-Programming/Base-sas-connect-to-remote-windows-server/m-p/420967#...

once done you can use rsubmit for parallel processing
rosegarden81
Obsidian | Level 7
I work on a remote desktop which does not have internet access! Hence, I wouldnt be able to use an r submit I guess..
sidpesar
Obsidian | Level 7
you can rusbmit on your own machine or remote machine. when you run on your machine sas opens instances and run your job(s) parallel and then we can send data to your current sas work folder

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 690 views
  • 0 likes
  • 2 in conversation