BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
genemroz
Quartz | Level 8

I find my self stuck in a situation where I need to create a full Cartesian Join between two datasets.  Dataset OBS has 1000 observations and Dataset JCM has 10000 observations.  I’m using Proc SQL, for the join.  But during a subsequent sort of the 10 million observation dataset  I sometimes (not always) get kicked out of SAS OnDemand for Academics for ‘excessive use of resources’.  I’m wondering if  whether or not I get kicked out is governed solely by my demand on the resources or is it also dependent on the overall demand for resources by other users.  Would I have better luck running this code during off-peak hours?  If so, when would that be?  

 

Thanks in advance,

Gene

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star
Basic things to consider:

Do you need all the rows?
Do you need all the columns?
Are there silly definitions such as all character fields defined as 200 characters long?
Any way you can shrink the data before joining can only help.

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

SAS On Demand allows many users to access the resources at the same time. So definitely, the resources available depends on other users as well.

--
Paige Miller
ballardw
Super User

It never hurts to include the code you are running.

 

One thing that may reduce resources used if you sort data using Proc Sort in a separate step and use the TAGSORT option. This creates smaller temporary data sets by only using the BY variables and observation numbers instead of moving lots of other variables around.

 

From your description I can't tell if you are using the SQL Order by or a separate step to sort the data. If ORDER BY is related to the resource limit a separate sort step might be in order. Can't hurt to test. It might help to show your code for other suggestions if you doing anything other than a simple select. There may be ways to break things up into steps that don't trigger resource limits.

Astounding
PROC Star
Basic things to consider:

Do you need all the rows?
Do you need all the columns?
Are there silly definitions such as all character fields defined as 200 characters long?
Any way you can shrink the data before joining can only help.
genemroz
Quartz | Level 8

Thanks for these suggestions.  In the end, I was able to accomplish the Cartesian join and subsequent sort through a combination of actions:

1.  Deleting a couple of unnecessary variables

2.  Using the Compress option to make the input and output datasets smaller

3.  Dividing the input datasets into smaller subsets for the Cartesian join.

4.  Sorting the subsetted Cartesian Join datasets by following the divide and conquer method suggested in a paper entitled "Sorting a Large Data Set When Space is Limited" by S. Sridharma.

 

A bit tedious to get it all sorted out but it worked...

Thanks to all who responded with suggestions,

Gene

 

 

SASKiwi
PROC Star

Depending on your actual business requirement, why do a Cartesian join at all? Often these types of processing can be solved with DATA step table joining so you avoid the blow out in size. So far all you have explained is how you  are trying to solve your problem, without describing what it actually is. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 877 views
  • 1 like
  • 6 in conversation