BookmarkSubscribeRSS Feed
dannyf
Calcite | Level 5

I'm running a merge on a rather big dataset. The problem is it sloooows my PC down it becomes annoying.

I've tried using bufsiz and bufno to reduce the amount SAS uses and in desperation set them to 1 but to no discernible difference.

I don't mind if the process takes longer, but is there a way so it only uses 1 GB of RAM?

The two tables are 17,396,594 and 1,763,513 rows and sit on my C Drive.

I'm using SAS Base 9.3 (Windows 7) and I don't want answers offering me a different 'Technical' solution (i.e. SAS Server etc.) but are there any settings I can change in my current environment?

Thanks for any help offered!

/* adding sequence count to records - in case incorrect #'s were given */

data clean.UPDT UCST;

  merge clean.UPDT (in=a)

  SEQ_LST (in=b rename=SEQUENCE_COUNT=LSEQ);

  by CUST_ID TMP_CUST;

  if SEQUENCE_COUNT < LSEQ then output UCST;

  else output clean.UPDT;

run;

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

If its just a space issue, consider splitting the two datasets up, e.g. take half the distinct custids, and merge those, then take the other etc

proc sql;

  create table FIRST_BLOCK as

  select  A.*, 

          B.MAKE as B_MAKE

  from    SASHELP.CARS A

  left join SASHELP.CARS B

  on      A.MAKE=B.MAKE

  where   A.MAKE in (select MAKE from (select distinct MAKE from SASHELP.CARS) where MONOTONIC() <= 10);

quit;

Kurt_Bremser
Super User

You're just I/O bound, mainly because:

- your disk is not suited for this kind of work

- all your files reside on the same disk, causing the disk heads to permanently jump between the physical positions of the infiles and the outfiles.

- on top of that, you may be the victim of disk fragmentation, causing I/O to slow down even more.

You can alleviate this by getting a faster disk (preferably SSD, that eliminates all kinds of latencies) or moving to several disks and allocating the libraries on physically separate drives.

You can also take a look at the hash object method. This lets you read the smaller file in one sweep before reading through the larger file. But you still have disk contention between reading and writing,

RCW
Calcite | Level 5 RCW
Calcite | Level 5

I'm inclined to agree with KurtBremser, but another possibility is to try UPDATE (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition).

In addition to his suggestions, look at the location of your temporary storage, and clean out any residues directories there.

Trouble with the SQL is that it's memory intensive - good chance you'll run into memory problems there., since you mention the 1G limitation, the way to go is DATA step.

Other possibilities: have you checked COMPRESS= data set option (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition) - depending on your data set you might squeeze space out there. Also the length of your variables (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition). They won't make a difference for temporary data sets (WORK) - SAS expands thing during processing (or maybe that's my 'experience' showing), but will make a difference in your permanent data set.

Hope these suggestions help

AhmedAl_Attar
Rhodochrosite | Level 12

Hi danny,

If you want to restrict SAS to only use 1 GB of RAM, then use the -memsize 1GB option within your SAS Windows shortcut. I'm guessing that's how you launch/start your SAS session.

If you cannot create unique index for your smaller data set "SEQ_LST" (1,763,513) on CUST_ID TMP_CUST, then I would suggest looking into using Hash Object for as alternative. There are several papers showing how to do that. Check www.lexjansen.com

Hope this helps,

Ahmed

RCW
Calcite | Level 5 RCW
Calcite | Level 5

I just got home and started looking at your code again and a couple of things occurred to me.

In particular,

     if SEQUENCE_COUNT < LSEQ then output UCST

     else output clean.UPDT;


might have a problem by not checking for sequence_count > lseq

Another thing, you've got IN= set:


     merge clean.UPDT (in=a)  

     SEQ_LST (in=b rename=SEQUENCE_COUNT=LSEQ);

but you never test it, so is it possible that records are coming in from seq_lst when there is no corresponding updt?

In reference to the memory issue, unless told otherwise, in a data step SAS does it I/O directly - i.e. it reads a record(s) from disk, processes it, and writes to disk. There are exceptions (hash tables for example and Data set options that keep it loaded), but those are exceptions. It helps to remember that SAS was written for mainframes using punch cards (I know this personally). Memory was barely measured in K (hence the reference to K in some of the options), let alone M or G. As I stated before, its PROC SQL that tends to hog

memory.

Another possibility that updates a data set directly, hence is more dangerous, is the MODIFY statement.

http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#n0g9jfr4x5hgsfn...

Go through a couple of examples and back up you data.

Tom
Super User Tom
Super User

Probably because you are reading and writing to the same disk that Windows is using as its swap disk.

Get another disk drive.

dannyf
Calcite | Level 5

Thanks all for suggestions.

The issue is the RAM that it's using so another drive won't help (i.e. I could use our network drive but that will be even slower).

I was hoping MEMSIZE would resolve my problems - (documentation isn't clear wither this is hard disk memory or RAM). But it said it was unable to process due the limitation!

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1468 views
  • 0 likes
  • 6 in conversation