I find my self stuck in a situation where I need to create a full Cartesian Join between two datasets. Dataset OBS has 1000 observations and Dataset JCM has 10000 observations. I’m using Proc SQL, for the join. But during a subsequent sort of the 10 million observation dataset I sometimes (not always) get kicked out of SAS OnDemand for Academics for ‘excessive use of resources’. I’m wondering if whether or not I get kicked out is governed solely by my demand on the resources or is it also dependent on the overall demand for resources by other users. Would I have better luck running this code during off-peak hours? If so, when would that be?
Thanks in advance,
Gene
SAS On Demand allows many users to access the resources at the same time. So definitely, the resources available depends on other users as well.
It never hurts to include the code you are running.
One thing that may reduce resources used if you sort data using Proc Sort in a separate step and use the TAGSORT option. This creates smaller temporary data sets by only using the BY variables and observation numbers instead of moving lots of other variables around.
From your description I can't tell if you are using the SQL Order by or a separate step to sort the data. If ORDER BY is related to the resource limit a separate sort step might be in order. Can't hurt to test. It might help to show your code for other suggestions if you doing anything other than a simple select. There may be ways to break things up into steps that don't trigger resource limits.
Thanks for these suggestions. In the end, I was able to accomplish the Cartesian join and subsequent sort through a combination of actions:
1. Deleting a couple of unnecessary variables
2. Using the Compress option to make the input and output datasets smaller
3. Dividing the input datasets into smaller subsets for the Cartesian join.
4. Sorting the subsetted Cartesian Join datasets by following the divide and conquer method suggested in a paper entitled "Sorting a Large Data Set When Space is Limited" by S. Sridharma.
A bit tedious to get it all sorted out but it worked...
Thanks to all who responded with suggestions,
Gene
There is a disk usage limit of 5 GB in SAS On Demand. Your cartesian join can exceed this limit, depending on the observation size
Depending on your actual business requirement, why do a Cartesian join at all? Often these types of processing can be solved with DATA step table joining so you avoid the blow out in size. So far all you have explained is how you are trying to solve your problem, without describing what it actually is.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.