BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

proc sort data=table1 out=_a nodupkey; by acctnbr; run;
proc sort data=table2 (keep=acctnbr r&questpin r&questmatch) out=_b nodupkey; by acctnbr; run;
data table3;

    merge _a(in=in_a)
      _b(in=in_b);
by acctnbr;
if in_a;
if in_b then quest='Y'; else quest='N';
run;

1. The above two datasets are huge,they have over 100MM records. and I get the above error message. This is run on Unix. Is there any turn around to fix the issue>?

2. How can this be done in SQL, to avoid sorting and merging,

Help is appreciated,

12 REPLIES 12
Kurt_Bremser
Super User

How do you think that SQL would accomplish this task? It can't ask $DEITY for the result, so it resorts to more or less the same technique (sorting and merging), but very often less efficient than most code that you write.

Your problem is not enough disk space (or user quota) in the location of your work directory.

Do you run out of space during sorting or during the merge?

KachiM
Rhodochrosite | Level 12

There are several ways to merge the data sets without sorting one or both the data sets. The size (100 M) seems to be large. Give some example data sets and the desired WANT data set

then suitable method(s) can be shown.

SASPhile
Quartz | Level 8

Well, I want all the columns from table1 and the columns from table2 (shown in keep options).

The only help needed is how to translate this datastep if conditions to sql.

if in_a;

if in_b then tag_found_quest='Y'; else tag_found_quest='N';

Kurt_Bremser
Super User

This would have to look like:

proc sql;

create table table3 as

select

  a.*,

  b.r&questpin,

  b.r&questmatch,

  case

    when b.acctnbr is missing

    then 'N'

    else 'Y'

  end as tag_found_quest

from table1 a

left join table2 b

on a.acctnbr = b.acctnbr

;

quit;

But be aware that, for this to work, you need enough space to hold

- table1

- table2

- sizeof (table1+table2) (roughly) for the SQL utility file

- table3

in your work location, which will most probably lead to another disk full condition.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Maybe also try piecemeal'ing your data, i.e. reduce the data you are working on to only one site or account block at a time and do lots of smaller parts. 

jakarman
Barite | Level 11

SQL will is as fast in generating this kind of errors, may be even faster as it not tuned to be using resources carefully. Going to an external DBMS you are also getting similar ones. Often there is a limit, very low one, on size of temporary tables as of preventing users to use too much.

Well you are running out of resources. There are two options to solve that.

- program your code in a smarter way to avoid that usage

- Add additional resources.

By the way 100M records is telling nothing it could be that small as just 1 or 2 Gbyte. A better measure aside the number of records is the total size in bytes /  the size of each record.   

---->-- ja karman --<-----
scmebu
SAS Employee

You can alleviate the disk space condition by using another storage device and separating the data sets from the utility files by either placing the input/output data sets in permanent storage (define and specify a library) instead of WORK and/or specifying a separate location for the utility files using the -utilloc option (this presumes that the multi-threaded SAS sort is being invoked by SORT and SQL - you can check by setting the MSGLEVEL to i (e.g., options msglevel=i;).  Either strategy will likely help performance as well.

KachiM
Rhodochrosite | Level 12

I do not have SQL head. Do you accept a data step solution which require no sorting?

Patrick
Opal | Level 21

If disk space for WORK is the issue then you could use one or several of below approaches:

- Before the sorting delete all tables in WORK which are no more required

- Don't create a new table but replace the existing table with a sorted version "proc sort data=table1 out=table1 nodupkey; by acctnbr; run"

- In case  table2 with the required variables (keep=acctnbr r&questpin r&questmatch) fits into memory: Don't sort your tables at all but load table 2 into a hash and then use a hash look-up instead of a merge.

- Write the sorted tables (out=) and table3 to a permanent data set using a libref which points to a file system with sufficient space.

- Make sure that UTILLOC points to another disk than WORK

You could use code as below. That would avoid creating additional tables in work. You would still need sufficient space in UTILLOC during sorting.

proc sort data=table1 nodupkey; by acctnbr; run;
proc sort data=table2 (keep=acctnbr r&questpin r&questmatch) nodupkey; by acctnbr; run;
data table1;

    merge table1(in=in_a)
      table2(in=in_b);
by acctnbr;
if in_a;
if in_b then quest='Y'; else quest='N';
run;

Kurt_Bremser
Super User

data table1;

    merge table1(in=in_a)
      table2(in=in_b);
by acctnbr;
if in_a;
if in_b then quest='Y'; else quest='N';
run;

This will not prevent the file full condition, since table1.sas7bdat and table1.sas7bdat.lck will exist concurrently until the data step finishes execution.

jakarman
Barite | Level 11

Except that the OP's is focused in believing SQL is solving his big data issue, we did not get any additional information on sizing release the logical question behind that join.

---->-- ja karman --<-----
jakarman
Barite | Level 11

At the moment just for background knowledge.

- When the condition is just deleting some records marked by a condition and not adding new variables.

- Additional storage resources are not easy to get. (really big data)

- You have backup/restore and checkpoint restart processes reviewed on needed and time.

You can do an update in place.

SAS(R) 9.4 Statements: Reference, Third Edition (modify statement), SAS(R) 9.4 Statements: Reference, Third Edition (remove statement).

This is the same approach a RDBMS would do in an OLTP approach. Marking records as being deleted, not replacing all datablocks. This is what SQL was designed for. 

If that is what you think is needed as "SQL" you can code in that way. The common SQL usage by analists is not done like that but based on copying complete tables.   

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 27061 views
  • 0 likes
  • 7 in conversation