BookmarkSubscribeRSS Feed
bshoun228
Calcite | Level 5

I have a code base that synthesizes a zip code to a claims dataset with a county code.  The zip data is from one dataset and the claims dataset has the fips/county code.  I can provide samples of both.
We have been running this for several years now without major issues.  However, the time to run this job is taking around 10-11 days whereas in the past is was closer to 6.  I believe there to be changes to the environment, but I unfortunately cannot affect those changes.  So I have to work around them. 
This is likely a big ask, but I was wondering if there was a possibility of optimizing the code for a potentially better production experience (shortened run time and QC).

 

I am attaching a zip file of the code, data, and format in a zip file that I hope has all of the information.  I realize this is a big ask of anyone, I'm just kind of stuck trying to figure out if there is a better way.

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

Please provide a set of files that can be run as-is by just changing the path at the top of the program.

Do not expect us to run something that will take more than a few minutes. Someone may be able to, but help yourself by making it easier for us.

Also, provide the original log if needed, i.e. if the supplied data does not allow to see where the bottlenecks (the steps that consume all the time) are.

Lastly, you provide a 600-line uncommented program.

 

You need to work and isolate the issue into a much smaller package.

 

 

 

 

 

 

ballardw
Super User

A brief reading of your code shows about 20 data steps and one call to proc surveyselect. So the topic likely should not be "optimize surveyselect".

 

You code also does not include

1) definition of apparently multiple formats that start &ZIP used in $zip&popyr.fips

2) definition of a macro used in multiple places %dsobs

 

I would be examining the LOG to see which steps are taking the most time and see if those can be addressed.

 

I am wondering if your data is actually so volatile with zipcodes and generated Fips code values that those steps need to be rerun all the time. Perhaps you should segregate that and only recreate that when something in your data indicates it needs to update. You don't mention the size of files but I think there is a lot of redundant and likely inefficient code just with those steps.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 718 views
  • 0 likes
  • 3 in conversation