Solved: Re: Data Transpose with proc summary and column reaarangement

Balas · Posted 02-20-2016 02:35 PM

Hi, I'm transposing the data with Proc summary and the code is as below -

data have;
input PT DT RULE_ID RULE_CAT;
cards;
1 10 20 20
1 10 21 20
1 10 22 20
1 10 23 20
1 11 20 20
1 11 21 20
1 11 22 20
1 11 23 20
1 12 20 20
1 12 21 20
1 12 22 20
1 12 23 20
2 10 20 20
2 10 21 20
2 10 22 20
2 10 23 20
2 11 20 20
2 11 21 20
2 11 22 20
2 11 23 20
2 12 20 20
2 12 21 20
2 12 22 20
2 12 23 20
;;;;
run;

proc sql noprint feedback;
select max(ptcount) into :dim
from (select count(PT) as ptcount from have group by PT, DT);
quit;
run;

OPTIONS SYMBOLGEN;
proc summary data=have nway;
class PT DT;
output out=need idgroup(out[&dim](RULE_ID: RULE_CAT: )=);
run;

With the above code my final dataset need looks like below.

So the question now is - columns need to be rearranged like, RULE_ID_1, RULE_CAT_1, RULE_ID_2, RULE_CAT_2,RULE_ID_3, RULE_CAT_3,RULE_ID_4, RULE_CAT_4.

Is this possible? Because the number of columns getting created is random and based on the value of the macro variable dim getting created from Proc Sql.

Please share your views and suggestions.

PGStats · Posted 02-21-2016 10:35 PM

For a huge dataset, it is indeed preferable to use another strategy. Here is what I would try:

data _null_;
retain count;
length varlist $2000;
do i = 1 by 1 until(last.dt);
    set have end=done; by pt dt notsorted;
    end;
count = max(count, i);
if done then do;
    do i = 1 to count;
        varList = catx(" ", varList, cats("RULE_ID_", i), cats("RULE_CAT_", i));
        end;
    call symputx("varList", varlist);
    end;
run;

%put &varList;

data want;
do i = 1 by 2 until(last.dt);
    set have end=done; by pt dt notsorted;
    array v &varList;
    v{i} = RULE_ID; v{i+1} = RULE_CAT;
    end;
drop i RULE_ID RULE_CAT;
run;

PG

View solution in original post

PGStats · Posted 02-20-2016 06:28 PM

Do it with a datastep. No macro variable needed.

proc sort data=have; by pt dt; run;

data long;
set have; by pt dt;
if first.dt then i = 0;
i + 1;
var = cats("RULE_ID_", i);
value = rule_id;
output;
var = cats("RULE_CAT_", i);
value = rule_cat;
output;
keep pt dt var value;
run;

proc transpose data=long out=want(drop=_name_);
by pt dt;
var value;
id var;
run;

proc print data=want noobs; run;

PG

Balas · Posted 02-21-2016 04:43 AM

Thanks PG Stats.

Your solution works absolutely well with the small datasets like the one provided.

But creating the long dataset with multi millions of records will take huge CPU time and memory where a column holds all the variable names needs to be transposed.

Can you please suggest - If I need to apply the same logic on huge datasets, do I need to tweak the logic or do we have any new logic to be built?

PGStats · Posted 02-21-2016 10:35 PM

For a huge dataset, it is indeed preferable to use another strategy. Here is what I would try:

data _null_;
retain count;
length varlist $2000;
do i = 1 by 1 until(last.dt);
    set have end=done; by pt dt notsorted;
    end;
count = max(count, i);
if done then do;
    do i = 1 to count;
        varList = catx(" ", varList, cats("RULE_ID_", i), cats("RULE_CAT_", i));
        end;
    call symputx("varList", varlist);
    end;
run;

%put &varList;

data want;
do i = 1 by 2 until(last.dt);
    set have end=done; by pt dt notsorted;
    array v &varList;
    v{i} = RULE_ID; v{i+1} = RULE_CAT;
    end;
drop i RULE_ID RULE_CAT;
run;

PG

Balas · Posted 02-22-2016 01:10 PM

Perfect. Thanks PG Stats.

Ksharp · Posted 02-22-2016 08:04 PM

PG,

1) What if one of them is CHARACTER variable?

2) Once you get variable list &listVar, why not use RETAIN statment on the OP's original data ?

PGStats · Posted 02-22-2016 10:06 PM

The code is meant to be fast for the problem at hand. It reads the data only twice and it doesn't create any intermediate dataset. Writing the first step in SQL might be a better strategy if it could be run on a server.

PG

Ksharp · Posted 02-21-2016 09:43 PM

Now it is MERGE time. Check MERGE skill me,Matt, Arthur.T proposed, if your table is big ,check CALL EXECUTE().

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

data have;
input PT DT RULE_ID RULE_CAT;
cards;
1 10 20 20
1 10 21 20
1 10 22 20
1 10 23 20
1 11 20 20
1 11 21 20
1 11 22 20
1 11 23 20
1 12 20 20
1 12 21 20
1 12 22 20
1 12 23 20
2 10 20 20
2 10 21 20
2 10 22 20
2 10 23 20
2 11 20 20
2 11 21 20
2 11 22 20
2 11 23 20
2 12 20 20
2 12 21 20
2 12 22 20
2 12 23 20
;;;;
run;
data have;
 set have;
 by PT DT;
 if first.DT then n=0;
 n+1;
run;
proc sql;
select distinct catt('have(where=(n=',n,') 
                     rename=(RULE_ID=RULE_ID_',n,' RULE_CAT=RULE_CAT_',n,'))')
        into : list separated by ' '
 from have;
quit;
data want;
 merge &list ;
 by PT DT;
 drop n;
run;

Make it better.

Balas · Posted 02-22-2016 01:12 PM

Thanks Ksharp. Solution Worked perfectly well with the small datasets.

But I faced a problem, while creating a macro variable list, the maximum allowable character length is more or less 6000 only.

If the length is more the value (for List) is getting truncated for huge datasets and the code execution is getting stopped with the Error message.

Could you please suggest how to rectify this unwanted truncation?

Ksharp · Posted 02-23-2016 08:40 PM

Check CALL EXECUTE().

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

Registration is open

SAS Training: Just a Click Away