BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Hi, 

 

I am running code as 

PROC SQL;
	CREATE TABLE want.C  AS
	SELECT DISTINCT
	*
	FROM
	have.A as A
	JOIN have.B AS B ON A.family_id=B.family_id
	;
QUIT;

But the result show as 'Error: Sort execution failure.' I check online and reason may be the lack of space. Do you have any method to increase the space?

 

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I did not say that SORT and MERGE is not suitable, I said that the MERGE cannot mimic the "cartesian join" done by SQL when there is a many-to-many relationship.

If you do a data step MERGE under this condition, you will get a NOTE in the log about "repeats of by values in multiple datasets". If you do not get that NOTE, all is well.

You can check the uniqueness (or lack thereof) by doing

proc sort
  data=source (keep=key)
  out=test
  nodupkey
;
by key;
run;

on your datasets before the join, where key is the variable you will join by. You can then see in the log if duplicates were deleted.

This is one of the steps of Maxim 3.

You can see if a dataset was compressed by running PROC CONTENTS on it (once again, Maxim 3) and studying the output. This will also give you information about variable lengths and the resulting observation size(s). All this information is pertinent for deciding which methods should be applied to do your join.

 

If you can increase your WORK space is dependent on how your SAS is set up. With a local installation (everything on your desktop), adding a second disk (SSD preferred) for WORK and adapting the SAS configuration file(s) would be a feasible way.

If it is a client/server environment, you need to get in contact with your SAS administrator. I can tell you that any diligent SAS admin will first check if your code is good in terms of efficiency (I always do that) before dishing out more disk space.

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

First of all, Maxim 3: Know Your Data

  • observation sizes
  • observation numbers
  • repeats of BY values
  • compressed or not

Next, make sure you have cleaned up your WORK location before running such steps. Depending on the relationship between the tables, PROC SORTs and a data step MERGE may be more efficient.

And always consider: is that DISTINCT in your SELECT necessary? It forces a sort of the join result on ALL variables.

Alexxxxxxx
Pyrite | Level 9

Many thanks for your reply.

 

the have.A has 78344558 rows,  and 5 columns; have.B has 8266191 rows and 3 columns.

 

I do not understand the meaning of 

  • repeats of BY values
  • compressed or not

1)Could you please explain it to me please?

 

as you said in the previous post, the sort and merge is not suitable for my current work.

I will not use DISTINCT, many thanks for your remind. 

2)I have clean all file in WORK library. is it the correct method to cleaned up my WORK location?

 

3) is there any method to increase my space?

Kurt_Bremser
Super User

I did not say that SORT and MERGE is not suitable, I said that the MERGE cannot mimic the "cartesian join" done by SQL when there is a many-to-many relationship.

If you do a data step MERGE under this condition, you will get a NOTE in the log about "repeats of by values in multiple datasets". If you do not get that NOTE, all is well.

You can check the uniqueness (or lack thereof) by doing

proc sort
  data=source (keep=key)
  out=test
  nodupkey
;
by key;
run;

on your datasets before the join, where key is the variable you will join by. You can then see in the log if duplicates were deleted.

This is one of the steps of Maxim 3.

You can see if a dataset was compressed by running PROC CONTENTS on it (once again, Maxim 3) and studying the output. This will also give you information about variable lengths and the resulting observation size(s). All this information is pertinent for deciding which methods should be applied to do your join.

 

If you can increase your WORK space is dependent on how your SAS is set up. With a local installation (everything on your desktop), adding a second disk (SSD preferred) for WORK and adapting the SAS configuration file(s) would be a feasible way.

If it is a client/server environment, you need to get in contact with your SAS administrator. I can tell you that any diligent SAS admin will first check if your code is good in terms of efficiency (I always do that) before dishing out more disk space.

Ksharp
Super User
You need
INNER JOIN
or
LEFT JOIN

not simple a JOIN

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2310 views
  • 1 like
  • 3 in conversation