Hello
LEt's say I have a data set called Have with Lonf structure.
I want to convert it to wide structure.
I know to convery long to side for one varaible using proc transpose.
My question-
Here I am doing x6 times proc transpose .
IS there a beter way to do it with less code that tranpsose Long to wide for 6 varaibles together?
Data have'
input CustID YYYYMM monIndex X W Z R Q Y ;
cards;
111 202401 1 10 11 13 15 19 1
111 202402 2 17 17 12 10 18 0
111 202403 3 19 18 12 12 11 0
111 202404 4 12 12 12 13 14 1
111 202405 5 16 15 14 18 17 1
111 202406 6 16 17 17 18 15 0
222 202401 1 11 11 11 11 11 0
222 202402 2 12 13 14 11 16 0
222 202403 3 19 19 18 16 19 1
222 202404 4 17 17 16 15 18 1
222 202405 5 16 15 16 16 17 0
222 202406 6 14 16 14 13 17 0
;
Run
proc transpose data=have out=wide_1;
by CustID ;
ID monIndex ;
Var X;
Run;
proc transpose data=have out=wide_2;
by CustID ;
ID monIndex ;
Var W;
Run;
proc transpose data=have out=wide_3;
by CustID ;
ID monIndex ;
Var Z;
Run;
proc transpose data=have out=wide_4;
by CustID ;
ID monIndex ;
Var R;
Run;
proc transpose data=have out=wide_5;
by CustID ;
ID monIndex ;
Var Q;
Run;
proc transpose data=have out=wide_6;
by CustID ;
ID monIndex ;
Var Y;
Run;
Data want;
Merge wide_1-wide_6;
by CustID;
Run;
VAR statement accepts multiple variables, no need to transpose each variable separately.
proc transpose data=have out=want prefix=Month_;
by CustID ;
ID monIndex ;
Var X W Z R Q Y;
Run;
Your WANT dataset is identical to WIDE_6, which presumably is not your intent. Please show what you expect WANT to look like.
you are right.
IT is because I forgot add prefix.
Data have;
input CustID YYYYMM monIndex X W Z R Q Y ;
cards;
111 202401 1 10 11 13 15 19 1
111 202402 2 17 17 12 10 18 0
111 202403 3 19 18 12 12 11 0
111 202404 4 12 12 12 13 14 1
111 202405 5 16 15 14 18 17 1
111 202406 6 16 17 17 18 15 0
222 202401 1 11 11 11 11 11 0
222 202402 2 12 13 14 11 16 0
222 202403 3 19 19 18 16 19 1
222 202404 4 17 17 16 15 18 1
222 202405 5 16 15 16 16 17 0
222 202406 6 14 16 14 13 17 0
;
Run
/**Task: Convert Long to Wide*****/
/***WAY1***/
proc transpose data=have out=wide_1 prefix=X;
by CustID ;
ID monIndex;
Var X;
Run;
proc transpose data=have out=wide_2 prefix=W;
by CustID ;
ID monIndex ;
Var W;
Run;
proc transpose data=have out=wide_3 prefix=Z;
by CustID ;
ID monIndex ;
Var Z;
Run;
proc transpose data=have out=wide_4 prefix=R;
by CustID ;
ID monIndex ;
Var R;
Run;
proc transpose data=have out=wide_5 prefix=Q;
by CustID ;
ID monIndex ;
Var Q;
Run;
proc transpose data=have out=wide_6 prefix=Y;
by CustID ;
ID monIndex ;
Var Y;
Run;
Data want;
Merge wide_1-wide_6;
by CustID;
DROP _name_;
Run;
VAR statement accepts multiple variables, no need to transpose each variable separately.
proc transpose data=have out=want prefix=Month_;
by CustID ;
ID monIndex ;
Var X W Z R Q Y;
Run;
Sorry but this is not the final solution, need one more transpose to get in it full wide structure
Since you have multiple variables your starting point is not truly a TALL dataset.
You should first convert your intermediate dataset into a TALL dataset. Then you can transpose that TALL dataset to a WIDE dataset.
proc transpose data=have out=tall;
by custid yyyymm monIndex;
var x -- y ;
run;
proc transpose data=tall out=want(drop=_name_);
by custid ;
id _name_ monindex;
var col1 ;
run;
Result
If you want the new variables ordered differently add a PROC SORT step in between.
proc sort;
by custid _name_ monindex;
run;
Result
Tom already gave you Double Transpose solution.
But if you have a very BIG table,could try MERGE skill proposed by me.
Data have;
input CustID YYYYMM monIndex X W Z R Q Y ;
cards;
111 202401 1 10 11 13 15 19 1
111 202402 2 17 17 12 10 18 0
111 202403 3 19 18 12 12 11 0
111 202404 4 12 12 12 13 14 1
111 202405 5 16 15 14 18 17 1
111 202406 6 16 17 17 18 15 0
222 202401 1 11 11 11 11 11 0
222 202402 2 12 13 14 11 16 0
222 202403 3 19 19 18 16 19 1
222 202404 4 17 17 16 15 18 1
222 202405 5 16 15 16 16 17 0
222 202406 6 14 16 14 13 17 0
;
proc sql noprint;
select distinct catt('have(where=(monIndex=',monIndex,') rename=(X=X',monIndex,' W=W',monIndex,' Z=Z',monIndex,' R=R',monIndex,' Q=Q',monIndex,' Y=Y',monIndex,'))')
into :merge separated by ' '
from have ;
quit;
data want;
merge &merge.;
by CustId;
drop YYYYMM monIndex;
run;
Very nice solution.
You wrote that if data set is big then this solution is better than using proc transpose?
Can you give a general rule of thumb what is a big data set? (How many rows, how many columns )
For example:
60 million rows and 10 columns is a big data set?
If you don't have gap in monIndex variable and dataset has been sorted by CustID and monIndex, Could try proc summary solution:
Data have;
input CustID YYYYMM monIndex X W Z R Q Y ;
cards;
111 202401 1 10 11 13 15 19 1
111 202402 2 17 17 12 10 18 0
111 202403 3 19 18 12 12 11 0
111 202404 4 12 12 12 13 14 1
111 202405 5 16 15 14 18 17 1
111 202406 6 16 17 17 18 15 0
222 202401 1 11 11 11 11 11 0
222 202402 2 12 13 14 11 16 0
222 202403 3 19 19 18 16 19 1
222 202404 4 17 17 16 15 18 1
222 202405 5 16 15 16 16 17 0
222 202406 6 14 16 14 13 17 0
;
proc sql noprint;
select max(n) into :n
from (select count(*) as n from have group by CustID);
quit;
proc summary data=have nway;
class CustID;
output out=want(drop=_type_ _freq_) idgroup(out[&n] (X W Z R Q Y)=);
run;
Thanks for all,
I summarize the provided solutions
Data have;
input CustID YYYYMM monIndex X W Z R Q Y ;
cards;
111 202401 1 10 11 13 15 19 1
111 202402 2 17 17 12 10 18 0
111 202403 3 19 18 12 12 11 0
111 202404 4 12 12 12 13 14 1
111 202405 5 16 15 14 18 17 1
111 202406 6 16 17 17 18 15 0
222 202401 1 11 11 11 11 11 0
222 202402 2 12 13 14 11 16 0
222 202403 3 19 19 18 16 19 1
222 202404 4 17 17 16 15 18 1
222 202405 5 16 15 16 16 17 0
222 202406 6 14 16 14 13 17 0
;
Run
/**Task: Convert Long to Wide*****/
/***WAY1***/
proc transpose data=have out=wide_1 prefix=X;
by CustID ;
ID monIndex;
Var X;
Run;
proc transpose data=have out=wide_2 prefix=W;
by CustID ;
ID monIndex ;
Var W;
Run;
proc transpose data=have out=wide_3 prefix=Z;
by CustID ;
ID monIndex ;
Var Z;
Run;
proc transpose data=have out=wide_4 prefix=R;
by CustID ;
ID monIndex ;
Var R;
Run;
proc transpose data=have out=wide_5 prefix=Q;
by CustID ;
ID monIndex ;
Var Q;
Run;
proc transpose data=have out=wide_6 prefix=Y;
by CustID ;
ID monIndex ;
Var Y;
Run;
Data want;
Merge wide_1-wide_6;
by CustID;
DROP _name_;
Run;
/***WAY2-Quick way--VAR statement accepts multiple variables, no need to transpose each variable separately***/
proc transpose data=have out=want prefix=X;
by CustID ;
ID monIndex;
Var X W Z R Q Y;
/*Var X--Y;*/
DROP _name_;
Run;
/***WAY3-Double Transpose solution***/
proc transpose data=have out=Longer;
by custid yyyymm monIndex;
var x -- y ;
run;
proc sort data=Longer;
by custid _name_ monindex;
run;
/*If you want the new variables ordered differently add a PROC SORT step in between*/
proc transpose data=Longer out=want(drop=_name_);
by custid ;
id _name_ monindex;
var col1 ;
run;
/***WAY4-IF you have very big data set****/
data want;
Merge
have(where=(monIndex=1) rename=(X=X1 W=W1 Z=Z1 R=R1 Q=Q1 Y=Y1))
have(where=(monIndex=2) rename=(X=X2 W=W2 Z=Z2 R=R2 Q=Q2 Y=Y2))
have(where=(monIndex=3) rename=(X=X3 W=W3 Z=Z3 R=R3 Q=Q3 Y=Y3))
have(where=(monIndex=4) rename=(X=X4 W=W4 Z=Z4 R=R4 Q=Q4 Y=Y4))
have(where=(monIndex=5) rename=(X=X5 W=W5 Z=Z5 R=R5 Q=Q5 Y=Y5))
have(where=(monIndex=6) rename=(X=X6 W=W6 Z=Z6 R=R6 Q=Q6 Y=Y6))
;
by CustId;
drop YYYYMM monIndex;
Run;
/***WAY4-IF you have very big data set--Using macro var tooo****/
proc sql noprint;
select distinct catt('have(where=(monIndex=',monIndex,') rename=(X=X',monIndex,' W=W',monIndex,' Z=Z',monIndex,' R=R',monIndex,' Q=Q',monIndex,' Y=Y',monIndex,'))')
into :merge separated by ' '
from have ;
quit;
%put &merge;
data want;
merge &merge.;
by CustId;
drop YYYYMM monIndex;
run;
/**Way5-If you don't have gap in monIndex variable and dataset has been sorted by CustID and monIndex, Could try proc summary solution***/
proc sql noprint;
select max(n) into :n
From (select count(*) as n from have group by CustID);
quit;
%put &n;
proc summary data=have nway;
class CustID;
output out=want(drop=_type_ _freq_) idgroup(out[&n] (X W Z R Q Y)=);
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.