BookmarkSubscribeRSS Feed
irena_g
Calcite | Level 5

 

Hello,

We work with big files. They are more 8000000 rows and more than 2000 columns. Our program runs for 4 hours. We want to improve the running time of the program. 

 

  1. I changed all merge section to sql code. It’s don’t help me very much. My program  now runs for  3:45 hours.
  2. I tried using simple data set statements and used SPDE files and indexes. I removed all proc sort statements--now my program runs for  3:25 hours.

 

May be you have another idea how can I improve the running time of the program. 

Many thanks,

Irena

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

This is a huge topic and depends largely on what your code does. 

 

Post your code and let ud have a look.

irena_g
Calcite | Level 5

@PeterClemmensen  Thanks

Unfortunately, I can't to post the program code. The program consists mainly of the merges.

PeterClemmensen
Tourmaline | Level 20

Do you sort the data prior to merging?

irena_g
Calcite | Level 5

@PeterClemmensen  In spde files i use options BYSORT=YES   so Ican skip proc sort.

andreas_lds
Jade | Level 19

@irena_g wrote:

@PeterClemmensen  Thanks

Unfortunately, I can't to post the program code. The program consists mainly of the merges.


Without seeing the code and the log it is hardly possible to suggest something improving the runtime.

  • When calling proc sort and the data is already sorted, using the option "presorted" will reduce the time proc sort needs.
  • Maybe the high running time is caused by the hardware involved. Can you post some details on the hardware you have?
ballardw
Super User

TAGSORT can reduce the temporary disk space used for sorts on large datasets improving performance.

 

If you think that your variable names give away too much information ("can't share code") then your names contain information that belongs in variables and not in the names and may be the reason you have too many variables. Plus processing information in names means that you likely have other data structure issues that negatively impact performance.

Astounding
PROC Star

As others have noted, "can't see the program" + "can't see the data" = "can only guess"

 

So here's my guess.

 

You need to fix the data.  For example, it takes longer to process 200 characters than it takes to process one character.  So  if (for example) GENDER should be "M" or "F" but it is actually defined as 200 characters long, you need to fix that first.  There's a lot of work involved to do this, starting with PROC CONTENTS to see the actual lengths being used.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1192 views
  • 0 likes
  • 5 in conversation