10-14-2015 01:44 AM
I am working on a huge data set with over 400 variables and around 99999 observations. It is a log detailing the usage of calls (local, international, duration, etc) , data, amount charged etc. I have another data set with the list of churners and the churn months. This data os given for 6 months and over 3 years separately.
1. I am not able to join the 2 data sets on the key variable : Product
It always gives me insufficient memory error in log. Tried to run it on first 40 obs then also it did not work.
2. I have to create a data set which has the details of calls for only 3 selected months keeping everything else same.
Since loops are not allowed in proc sql, can anyone please help me out how to go about it?
10-14-2015 01:48 AM
10-14-2015 01:57 AM
for joining of the data sets i used the code:
select c.*, s.*
from cnl as c right join smb as s
on s.product_id = c.product_id;
the size of smb is 1.2 gb and cnl is a few kbs.
for the second part where I need usage for only 3 selected months I do not know how to go about it. Please help.
10-14-2015 04:40 AM - edited 10-14-2015 04:47 AM
Not sure why you're running out of memory. For the SQL you've posted an alternative approach which should perform better would be using a hash lookup table. Condition for below code to work properly (as well as for your SQL): PRODUCT_ID is the primary key in both tables.
data want(drop=_rc); if _n_=1 then do; if 0 then set cnl; dcl hash h (dataset:'cnl'); h.defineKey('product_id'); h.defineData(all:'y'); h.defineDone(); end; call missing(of _all_); set smb; _rc=h.find(); run;
10-14-2015 10:23 AM
First thing with any date related topic: are you dates SAS date valued variables? That makes anything related to dates much easier in 99.9% of cases.