Hi, I'm transposing the data with Proc summary and the code is as below -
data have;
input PT DT RULE_ID RULE_CAT;
cards;
1 10 20 20
1 10 21 20
1 10 22 20
1 10 23 20
1 11 20 20
1 11 21 20
1 11 22 20
1 11 23 20
1 12 20 20
1 12 21 20
1 12 22 20
1 12 23 20
2 10 20 20
2 10 21 20
2 10 22 20
2 10 23 20
2 11 20 20
2 11 21 20
2 11 22 20
2 11 23 20
2 12 20 20
2 12 21 20
2 12 22 20
2 12 23 20
;;;;
run;
proc sql noprint feedback;
select max(ptcount) into :dim
from (select count(PT) as ptcount from have group by PT, DT);
quit;
run;
OPTIONS SYMBOLGEN;
proc summary data=have nway;
class PT DT;
output out=need idgroup(out[&dim](RULE_ID: RULE_CAT: )=);
run;
With the above code my final dataset need looks like below.
So the question now is - columns need to be rearranged like, RULE_ID_1, RULE_CAT_1, RULE_ID_2, RULE_CAT_2,RULE_ID_3, RULE_CAT_3,RULE_ID_4, RULE_CAT_4.
Is this possible? Because the number of columns getting created is random and based on the value of the macro variable dim getting created from Proc Sql.
Please share your views and suggestions.
For a huge dataset, it is indeed preferable to use another strategy. Here is what I would try:
data _null_;
retain count;
length varlist $2000;
do i = 1 by 1 until(last.dt);
set have end=done; by pt dt notsorted;
end;
count = max(count, i);
if done then do;
do i = 1 to count;
varList = catx(" ", varList, cats("RULE_ID_", i), cats("RULE_CAT_", i));
end;
call symputx("varList", varlist);
end;
run;
%put &varList;
data want;
do i = 1 by 2 until(last.dt);
set have end=done; by pt dt notsorted;
array v &varList;
v{i} = RULE_ID; v{i+1} = RULE_CAT;
end;
drop i RULE_ID RULE_CAT;
run;
Do it with a datastep. No macro variable needed.
proc sort data=have; by pt dt; run;
data long;
set have; by pt dt;
if first.dt then i = 0;
i + 1;
var = cats("RULE_ID_", i);
value = rule_id;
output;
var = cats("RULE_CAT_", i);
value = rule_cat;
output;
keep pt dt var value;
run;
proc transpose data=long out=want(drop=_name_);
by pt dt;
var value;
id var;
run;
proc print data=want noobs; run;
Thanks PG Stats.
Your solution works absolutely well with the small datasets like the one provided.
But creating the long dataset with multi millions of records will take huge CPU time and memory where a column holds all the variable names needs to be transposed.
Can you please suggest - If I need to apply the same logic on huge datasets, do I need to tweak the logic or do we have any new logic to be built?
For a huge dataset, it is indeed preferable to use another strategy. Here is what I would try:
data _null_;
retain count;
length varlist $2000;
do i = 1 by 1 until(last.dt);
set have end=done; by pt dt notsorted;
end;
count = max(count, i);
if done then do;
do i = 1 to count;
varList = catx(" ", varList, cats("RULE_ID_", i), cats("RULE_CAT_", i));
end;
call symputx("varList", varlist);
end;
run;
%put &varList;
data want;
do i = 1 by 2 until(last.dt);
set have end=done; by pt dt notsorted;
array v &varList;
v{i} = RULE_ID; v{i+1} = RULE_CAT;
end;
drop i RULE_ID RULE_CAT;
run;
Perfect. Thanks PG Stats.
PG,
1) What if one of them is CHARACTER variable?
2) Once you get variable list &listVar, why not use RETAIN statment on the OP's original data ?
The code is meant to be fast for the problem at hand. It reads the data only twice and it doesn't create any intermediate dataset. Writing the first step in SQL might be a better strategy if it could be run on a server.
Now it is MERGE time. Check MERGE skill me,Matt, Arthur.T proposed, if your table is big ,check CALL EXECUTE().
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data have;
input PT DT RULE_ID RULE_CAT;
cards;
1 10 20 20
1 10 21 20
1 10 22 20
1 10 23 20
1 11 20 20
1 11 21 20
1 11 22 20
1 11 23 20
1 12 20 20
1 12 21 20
1 12 22 20
1 12 23 20
2 10 20 20
2 10 21 20
2 10 22 20
2 10 23 20
2 11 20 20
2 11 21 20
2 11 22 20
2 11 23 20
2 12 20 20
2 12 21 20
2 12 22 20
2 12 23 20
;;;;
run;
data have;
set have;
by PT DT;
if first.DT then n=0;
n+1;
run;
proc sql;
select distinct catt('have(where=(n=',n,')
rename=(RULE_ID=RULE_ID_',n,' RULE_CAT=RULE_CAT_',n,'))')
into : list separated by ' '
from have;
quit;
data want;
merge &list ;
by PT DT;
drop n;
run;
Make it better.
Thanks Ksharp. Solution Worked perfectly well with the small datasets.
But I faced a problem, while creating a macro variable list, the maximum allowable character length is more or less 6000 only.
If the length is more the value (for List) is getting truncated for huge datasets and the code execution is getting stopped with the Error message.
Could you please suggest how to rectify this unwanted truncation?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.