Updating big dataset - Page 2

Kurt_Bremser · Posted 08-13-2021 01:45 AM

First, get to know your data. If there are lots of long character variables, the physical dataset size can become overwhelming. Even if COMPRESS is used, sort utility files will be uncompressed and take a long time to write and read.

If your "child datasets" are in fact lookup tables, these can be handled much better by hash objects than by individual joins. I have often replaced multiple SORT/MERGE or SQL JOIN steps with a single data step that uses multiple hashes in a sequential read of the large dataset, resulting in a BIG performance gain.

Locate the longest running step, run it with options fullstimer, and post the complete log Also show the relevant part of PROC CONTENTS output for the dataset at that point (number of obs and variables, observation size). Take care to edit sensitive information, if that is required.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 08-13-2021 04:13 AM

We need the full log (use options fullstimer) to identify the longest steps, and the limiting factor (disk/network, CPU, or memory) for each step. Or post the longest ones and the previous and next ones if the log is just too long.

It's very likely too that some steps are redundant or can be better written, or steps can be re-organised or combined.

High-Performance SAS Coding - Third Edition

binhle50 · Posted 08-13-2021 09:37 AM

Thanks for your advice! Yes I agree, it is very likely I have some redundant procedures. The SAS program is almost 3000 line longs and developed in a long time period. Some code been added later on is not well organized. Also the SAS was linked with so many excel files and a bunch of other datasets to calculate some important variables. I am thinking to cut the SAS program into 2 or 3 smaller programs to take it easier for SAS to run.
Best,
Binh

Re: Updating big dataset

Re: Updating big dataset

Re: Updating big dataset

Re: Updating big dataset

Re: Updating big dataset

Re: Updating big dataset

Registration is open

SAS Training: Just a Click Away