Solved: Data Transpose and Rearrange the columns

Balas · Posted 02-20-2016 02:16 PM

Hi, I'm transposing the data with Proc summary and the code is as below -

data have;
input PT DT RULE_ID RULE_CAT;
cards;
1 10 20 20
1 10 21 20
1 10 22 20
1 10 23 20
1 11 20 20
1 11 21 20
1 11 22 20
1 11 23 20
1 12 20 20
1 12 21 20
1 12 22 20
1 12 23 20
2 10 20 20
2 10 21 20
2 10 22 20
2 10 23 20
2 11 20 20
2 11 21 20
2 11 22 20
2 11 23 20
2 12 20 20
2 12 21 20
2 12 22 20
2 12 23 20
;;;;
run;

proc sql noprint feedback;
select max(ptcount) into :dim
from (select count(PT) as ptcount from have group by PT, DT);
quit;
run;

OPTIONS SYMBOLGEN;
proc summary data=have nway;
class PT DT;
output out=need idgroup(out[&dim](RULE_ID: RULE_CAT: )=);
run;

With the above code my final dataset need looks like below.

So the question now is - columns need to be rearranged like, RULE_ID_1, RULE_CAT_1, RULE_ID_2, RULE_CAT_2,RULE_ID_3, RULE_CAT_3,RULE_ID_4, RULE_CAT_4.

Is this possible? Because the number of columns getting created is random and based on the value of the macro variable dim getting created from Proc Sql.

Please share your views and suggestions.

Balas · Posted 02-21-2016 04:46 AM

Thanks TomKari.

Your solution works absolutely well with the small datasets like the one provided.

But creating the lnter2 dataset with multi millions of records will take huge CPU time and memory where a column holds all the variable names needs to be transposed.

Can you please suggest - If I need to apply the same logic on huge datasets, do I need to tweak the logic or do we have any new logic to be built?

View solution in original post

Reeza · Posted 02-20-2016 03:18 PM

You may be interested in this macro from this paper that has a few methods built in to do what you're requiring.

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

TomKari · Posted 02-20-2016 03:41 PM

I think this is pretty close...

proc sort data=Have out=Inter1;
by PT DT RULE_ID;
run;

data Inter2;
Length CN $20;
retain Count;
keep PT DT CN DV;
set Inter1;
by PT DT;
if first.DT
then do;
Count = 0;
end;
Count = Count + 1;
CN = "RULE_ID_"||compress(put(Count, best7.));
DV = RULE_ID;
output;
CN = "RULE_CAT_"||compress(put(Count, best7.));
DV = RULE_CAT;
output;
run;

PROC TRANSPOSE DATA=Inter2 OUT=Want;
BY PT DT;
ID CN;
VAR DV;
RUN;

Balas · Posted 02-21-2016 04:46 AM

Thanks TomKari.

Your solution works absolutely well with the small datasets like the one provided.

But creating the lnter2 dataset with multi millions of records will take huge CPU time and memory where a column holds all the variable names needs to be transposed.

Can you please suggest - If I need to apply the same logic on huge datasets, do I need to tweak the logic or do we have any new logic to be built?

TomKari · Posted 02-21-2016 10:12 AM

I've found that trying to predict performance is a mugs game.

I'd suggest giving it a try. I usually start with say 100,000 records, if I'm happy up it to a million, and just keep going up by factors of 10.

Tom

LinusH · Posted 02-21-2016 04:30 AM

Like many other times I would like to question the need for transposing to wide format. What kind of analysis /processing will use the result?
Like you said, a problem is the varying no of columns - and that will probably be true for the programs/queries to follow.

Data never sleeps

Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Re: Data Transpose and Rearrange the columns

Registration is open