I'm running a very simple inner join query in SAS , and it gives me the "insufficient space in work.SASTMP" error. My work directory is empty, and I'm not even using it. All my SAS data sets are in my C drive which has more than 200G space.
Can anyone help me with that? Again my libraries are in my C drive.
Thanks
No miracle you run out of space:
proc sql outobs=10;
create table CPI.combined as
select distinct
a.ACCTNUM,
a.CUST_ID,
b.SourceACCT,
b.CUST_ID
from CPI.CUST_profile_info a,
CPI.CUST_DEC_PRO b
where a.CUST_ID ^= b.CUST_ID;
quit;
This means that every observation in cust_profile is joined with ALL observations of cust_dec_pro that have a different cust_id.
If the datasets have distinct cust_id's, you end up with 900000 * 599999 observations (539,999,100,000)
Even a slight mistake can cause SQL to run out of space, so we need to see the code. It's also important to know the data, so a quick overview (number of observations & variables, observation size) of the datasets will be helpful. That you do not use WORK does not mean much, as proc sql builds its utility file(s) there, and that seems to be the problem. Is WORK also located on the C: drive?
I'm trying to submit the following query. I have also tried other queries and still get the same error. yes, WORK is located on my C: drive, but I have plenty of space there (more than 200 GB). one of my tables have 900000 observations and is 1.2 GB.The other one has 600000 observations and 412MG. the tables include personal information bout customers.
proc sql outobs=10;
create table CPI.combined as
select distinct
a.ACCTNUM,
a.CUST_ID,
b.SourceACCT,
b.CUST_ID
from CPI.CUST_profile_info a,
CPI.CUST_DEC_PRO b
where a.CUST_ID ^= b.CUST_ID;
quit;
No miracle you run out of space:
proc sql outobs=10;
create table CPI.combined as
select distinct
a.ACCTNUM,
a.CUST_ID,
b.SourceACCT,
b.CUST_ID
from CPI.CUST_profile_info a,
CPI.CUST_DEC_PRO b
where a.CUST_ID ^= b.CUST_ID;
quit;
This means that every observation in cust_profile is joined with ALL observations of cust_dec_pro that have a different cust_id.
If the datasets have distinct cust_id's, you end up with 900000 * 599999 observations (539,999,100,000)
PS do you want to find matches or non-matches on a join by cust_id?
Yes, this is exactly what I'm trying to do. I was using the inner join first then I changed it to where statement.
I have also tried to submit the following query just to see if it works, but I still get the same error message after an hour.
proc sql outobs=10;
create table CPI.test as
select distinct
a.CUST_ID,
a.ACCTNUM
from CPI.CUST_INFO;
quit;
SQL is notoriously bad when it has to do sorting on big tables.
Use this instead:
proc sort
data=cpi.cust_info (keep=cust_id acctnum)
out=cpi.test
nodupkey
;
by cust_id acctnum;
run;
If you want to discover cust_id's in one dataset that are not present in the other, use a data step merge.
If you need to find matching cust_id's with differences in other variables, it can also be achieved in a data step merge.
Thank you so much, it works.
you're the best 🙂
Are you using SAS UE?
No, I'm using SAS PC
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.