Solved: Not enough memory for proc transpose

Steelers_In_DC · Posted 04-20-2016 09:31 AM

Good Morning All,

I have a solution that I've used in the past with smaller datasets, the current dataset I am working with has 350 'rules' and about 1 million records, so the transpose step isn't working due to the wide dataset that is produced. I tried to split the original dataset up into 3rds but the 300,000 columns are still too much. Below is a short example of what I am doing, any suggestions or different solutions are greatly appreciated.

data have;
infile cards dsd;
informat run_date reporting_month mmddyy10.;
format run_date reporting_month mmddyy10.;
input run_date reporting_month _rule1 _rule2 _rule3;
cards;
04/20/2016,03/31/2016,.,1,0
04/20/2016,03/31/2016,1,1,1
04/20/2016,03/31/2016,1,1,1
04/20/2016,03/31/2016,1,1,0
04/20/2016,03/31/2016,.,0,0
;run;

proc transpose data=have out=tran_have;by Run_Date reporting_month;var _:;

data want(keep=Rule Fail_Total Run_Date reporting_month);
set tran_have;
Rule = substr(_NAME_,2,5);
Fail_Total = sum(of col:);
run;

Astounding · Posted 04-20-2016 10:24 AM

It looks like your data set is sorted, so the BY statement should work:

proc summary data=have;

by run_date reporting month;

var _rule1 _rule2 _rule3;

output out=want (drop=_type_ _freq_) sum=;

run;

Of course, the task of typing out the VAR statement is not so easy in real life.

View solution in original post

Astounding · Posted 04-20-2016 09:51 AM

Surely there will be another way to program this, without transposing the data. It looks like you need to get some totals or subtotals for each run_date/reporting_month combination. SAS has lots of ways to accomplish this. It's probably best to start with a clean slate and describe the end result you would like, based on the original data.

Steelers_In_DC · Posted 04-20-2016 10:11 AM

I have several datasets that have edits that need to be applied for governmental checks. I'm working on the largest one now which is 1st lien mortgage data. I wrote out 350+ rules like this:

'_1_All Failures'n =1;
'_2_All Failures'n = missing(loan_closing_date);
'_2.1_All Failures'n = loan_closing_date > reporting_month;
if not missing(loan_closing_date) then do;
'_2.2_All Failures'n = loan_closing_date < Forty_years_ago;end;

I chose the name of the rules to coincide with a name provided by the government so I can join and get additional information downstream. This produces a dataset with 558 variables by 987,144 rows.

Astounding · Posted 04-20-2016 10:19 AM

OK, so what is your intent from that point? It looks like you are getting the sum (for each reporting month/date) of each of your rule variables. Is it more complex than that?

Steelers_In_DC · Posted 04-20-2016 10:22 AM

Nope, that's it. Sounds remedial when I say it but I'm not sure what to do.

Steelers_In_DC · Posted 04-20-2016 10:23 AM

Here's what I get when I limit the population in the first step. This is what I want:

run_date	reporting_month	Rule	Fail_Total
04/20/2016	01/31/2016	1_All Failures
04/20/2016	01/31/2016	2_All Failures	0
04/20/2016	01/31/2016	2.1_All Failures	0
04/20/2016	01/31/2016	2.2_All Failures	0
04/20/2016	01/31/2016	2.4_All Failures	0
04/20/2016	01/31/2016	3_All Failures	11
04/20/2016	01/31/2016	3.1_All Failures	0
04/20/2016	01/31/2016	3.2_All Failures	0
04/20/2016	01/31/2016	3.3_All Failures	11
04/20/2016	01/31/2016	3.5_All Failures	0
04/20/2016	01/31/2016	4.1_All Failures
04/20/2016	01/31/2016	5.1_All Failures
04/20/2016	01/31/2016	5.2_All Failures	0
04/20/2016	01/31/2016	6.2_All Failures	145
04/20/2016	01/31/2016	6.3_All Failures	0
04/20/2016	01/31/2016	7.2_All Failures	0
04/20/2016	01/31/2016	8.1_All Failures	23
04/20/2016	01/31/2016	8.2_All Failures
04/20/2016	01/31/2016	8.4_All Failures	1595
04/20/2016	01/31/2016	9_All Failures	0
04/20/2016	01/31/2016	9.1_All Failures
04/20/2016	01/31/2016	10.2_All Failures	0
04/20/2016	01/31/2016	11_All Failures	98919
04/20/2016	01/31/2016	11.1_All Failures	0
04/20/2016	01/31/2016	11.3_All Failures	51
04/20/2016	01/31/2016	12_All Failures	98765

Astounding · Posted 04-20-2016 10:24 AM

It looks like your data set is sorted, so the BY statement should work:

proc summary data=have;

by run_date reporting month;

var _rule1 _rule2 _rule3;

output out=want (drop=_type_ _freq_) sum=;

run;

Of course, the task of typing out the VAR statement is not so easy in real life.

Astounding · Posted 04-20-2016 10:31 AM

We're crossing paths on our posts, but a little additional processing would be needed:

data really_want;

set want;

array sums {350} /* long list of names here */;

length rule_name $ 50;

do _n_=1 to 350;

rule_name = vname(sums{_n_});

total = sums{_n_};

output;

end;

keep reporting_month run_date rule_name total;

run;

Where it is appropriate, you may need to fill in some missing values with zero.

Steelers_In_DC · Posted 04-20-2016 10:44 AM

That is awesome, thank you so much for your help. Much appreciated. I will definitely work to become more familiar with proc summary!!! By the way, no need to list the vars, that's why I started with an underscore. The final product which works perfectly looks like this:

proc summary data=FL_Rules;
by run_date reporting_month;
var _:;
output out=want (drop=_type_ _freq_ ) sum=;
run;

proc transpose data=want out=want_tran (rename=(col1=Fail_Total));by Run_Date reporting_month;var _:;

Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Re: Not enough memory for proc transpose

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!