Working on various columns all together in a huge datasets in proc sql

Reply
New Contributor
Posts: 2

Working on various columns all together in a huge datasets in proc sql

hi,

I am working on a huge data set with over 400 variables and around 99999 observations. It is a log detailing the usage of calls (local, international, duration, etc) , data, amount charged etc. I have another data set with the list of churners and the churn months. This data os given for 6 months and over 3 years separately.

1. I am not able to join the 2 data sets on the key variable : Product 

It always gives me insufficient memory error in log. Tried to run it on first 40 obs then also it did not work.

2. I have to create a data set which has the details of calls for only 3 selected months keeping everything else same.

 

Since loops are not allowed in proc sql, can anyone please help me out how to go about it?

Grand Advisor
Posts: 17,294

Re: Working on various columns all together in a huge datasets in proc sql

Neither of those datasets sounds large enough to run out of memory. What size GB are the SAS data sets? Also, please post your code.
New Contributor
Posts: 2

Re: Working on various columns all together in a huge datasets in proc sql

for joining of the data sets i used the code:

 

proc sql;
select c.*, s.*
from cnl as c right join smb as s
on s.product_id = c.product_id;
quit;

 

the size of smb is 1.2 gb and cnl is a few kbs.

 

for the second part where I need usage for only 3 selected months I do not know  how to go about it. Please help.

Respected Advisor
Posts: 3,822

Re: Working on various columns all together in a huge datasets in proc sql

[ Edited ]

@mansinarang12

Not sure why you're running out of memory. For the SQL you've posted an alternative approach which should perform better would be using a hash lookup table. Condition for below code to work properly (as well as for your SQL): PRODUCT_ID is the primary key in both tables. 


data want(drop=_rc);
  if _n_=1 then
    do;
      if 0 then set cnl;
      dcl hash h (dataset:'cnl');
      h.defineKey('product_id');
      h.defineData(all:'y');
      h.defineDone();
    end;
  call missing(of _all_);

  set smb;
  _rc=h.find();
run;

 

Grand Advisor
Posts: 10,192

Re: Working on various columns all together in a huge datasets in proc sql

First thing with any date related topic: are you dates SAS date valued variables? That makes anything related to dates much easier in 99.9% of cases.

 

 

Ask a Question
Discussion stats
  • 4 replies
  • 253 views
  • 0 likes
  • 4 in conversation