Re: Stop Hogging all my resource!

dannyf · Posted 10-24-2014 04:36 AM

I'm running a merge on a rather big dataset. The problem is it sloooows my PC down it becomes annoying.

I've tried using bufsiz and bufno to reduce the amount SAS uses and in desperation set them to 1 but to no discernible difference.

I don't mind if the process takes longer, but is there a way so it only uses 1 GB of RAM?

The two tables are 17,396,594 and 1,763,513 rows and sit on my C Drive.

I'm using SAS Base 9.3 (Windows 7) and I don't want answers offering me a different 'Technical' solution (i.e. SAS Server etc.) but are there any settings I can change in my current environment?

Thanks for any help offered!

/* adding sequence count to records - in case incorrect #'s were given */

data clean.UPDT UCST;

merge clean.UPDT (in=a)

SEQ_LST (in=b rename=SEQUENCE_COUNT=LSEQ);

by CUST_ID TMP_CUST;

if SEQUENCE_COUNT < LSEQ then output UCST;

else output clean.UPDT;

run;

RW9 · Posted 10-24-2014 05:05 AM

If its just a space issue, consider splitting the two datasets up, e.g. take half the distinct custids, and merge those, then take the other etc

proc sql;

create table FIRST_BLOCK as

select A.*,

B.MAKE as B_MAKE

from SASHELP.CARS A

left join SASHELP.CARS B

on A.MAKE=B.MAKE

where A.MAKE in (select MAKE from (select distinct MAKE from SASHELP.CARS) where MONOTONIC() <= 10);

quit;

Kurt_Bremser · Posted 10-24-2014 05:33 AM

You're just I/O bound, mainly because:

- your disk is not suited for this kind of work

- all your files reside on the same disk, causing the disk heads to permanently jump between the physical positions of the infiles and the outfiles.

- on top of that, you may be the victim of disk fragmentation, causing I/O to slow down even more.

You can alleviate this by getting a faster disk (preferably SSD, that eliminates all kinds of latencies) or moving to several disks and allocating the libraries on physically separate drives.

You can also take a look at the hash object method. This lets you read the smaller file in one sweep before reading through the larger file. But you still have disk contention between reading and writing,

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 10-24-2014 05:35 AM

And oh yeah, doing a defragmentation of your C: drive may help a little.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RCW · Posted 10-24-2014 08:31 AM

I'm inclined to agree with KurtBremser, but another possibility is to try UPDATE (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition).

In addition to his suggestions, look at the location of your temporary storage, and clean out any residues directories there.

Trouble with the SQL is that it's memory intensive - good chance you'll run into memory problems there., since you mention the 1G limitation, the way to go is DATA step.

Other possibilities: have you checked COMPRESS= data set option (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition) - depending on your data set you might squeeze space out there. Also the length of your variables (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition). They won't make a difference for temporary data sets (WORK) - SAS expands thing during processing (or maybe that's my 'experience' showing), but will make a difference in your permanent data set.

Hope these suggestions help

AhmedAl_Attar · Posted 10-24-2014 08:50 AM

Hi danny,

If you want to restrict SAS to only use 1 GB of RAM, then use the -memsize 1GB option within your SAS Windows shortcut. I'm guessing that's how you launch/start your SAS session.

If you cannot create unique index for your smaller data set "SEQ_LST" (1,763,513) on CUST_ID TMP_CUST, then I would suggest looking into using Hash Object for as alternative. There are several papers showing how to do that. Check www.lexjansen.com

Hope this helps,

Ahmed

RCW · Posted 10-24-2014 07:33 PM

I just got home and started looking at your code again and a couple of things occurred to me.

In particular,

if SEQUENCE_COUNT < LSEQ then output UCST

else output clean.UPDT;

might have a problem by not checking for sequence_count > lseq

Another thing, you've got IN= set:

merge clean.UPDT (in=a)

SEQ_LST (in=b rename=SEQUENCE_COUNT=LSEQ);

but you never test it, so is it possible that records are coming in from seq_lst when there is no corresponding updt?

In reference to the memory issue, unless told otherwise, in a data step SAS does it I/O directly - i.e. it reads a record(s) from disk, processes it, and writes to disk. There are exceptions (hash tables for example and Data set options that keep it loaded), but those are exceptions. It helps to remember that SAS was written for mainframes using punch cards (I know this personally). Memory was barely measured in K (hence the reference to K in some of the options), let alone M or G. As I stated before, its PROC SQL that tends to hog

memory.

Another possibility that updates a data set directly, hence is more dangerous, is the MODIFY statement.

http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#n0g9jfr4x5hgsfn...

Go through a couple of examples and back up you data.

Tom · Posted 10-24-2014 09:06 PM

Probably because you are reading and writing to the same disk that Windows is using as its swap disk.

Get another disk drive.

dannyf · Posted 10-29-2014 11:12 AM

Thanks all for suggestions.

The issue is the RAM that it's using so another drive won't help (i.e. I could use our network drive but that will be even slower).

I was hoping MEMSIZE would resolve my problems - (documentation isn't clear wither this is hard disk memory or RAM). But it said it was unable to process due the limitation!

Kurt_Bremser · Posted 10-30-2014 05:30 AM

What exactly was the ERROR message? Please post the log including the code that caused the ERROR.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX