Hello everyone,
I have a dataset for an exam but need to reformat them from long to wide with specific variable names and dichotomizing all options. I can recode the answers with if then statements but it gets too long for a long test (like over 150 items). How do I achieve this through array and do loop? Thank you very much in advance!
data example;
input ID $ measure Item1 $ Item2 $ Item3 $;
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;
data want;
input ID $ measure I1A I1B I1C I1D I2A I2B I2C I2D I3A I3B I3C I3D;
cards;
1 -1.26 0 1 0 0 0 0 0 1 1 0 0 0
2 -0.49 1 0 0 0 0 0 1 0 0 0 0 1
3 1.58 0 1 0 0 0 0 0 1 1 0 0 0
4 0.69 0 0 1 0 0 0 0 1 1 0 0 0
5 -0.63 1 0 0 0 0 0 1 0 0 1 0 0
;
Variable names basically use I (for item) and 1 as the first item, then the options A, B, C, D. For answer A, it becomes 1 0 0 0, answer B is 0 1 0 0, etc. for a particular item.
I like PROC TRANSPOSE for this as the variables name themselves.
data example;
input ID $ measure (Item1-Item3)($);
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;;;;
proc print;
run;
proc transpose data=example out=flip;
by id measure;
var item:;
run;
data flip;
set flip(drop=id measure) flip;
run;
proc summary data=flip nway missing completetypes classdata=flip;
by id measure;
class _name_ col1;
output out=flip2;
run;
*proc print;
run;
proc transpose data=flip2 out=flop(drop=_name_ where=(not missing(ID))) delim=_;
by id measure;
id _name_ col1;
var _freq_;
run;
proc print;
run;
I like PROC TRANSPOSE for this as the variables name themselves.
data example;
input ID $ measure (Item1-Item3)($);
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;;;;
proc print;
run;
proc transpose data=example out=flip;
by id measure;
var item:;
run;
data flip;
set flip(drop=id measure) flip;
run;
proc summary data=flip nway missing completetypes classdata=flip;
by id measure;
class _name_ col1;
output out=flip2;
run;
*proc print;
run;
proc transpose data=flip2 out=flop(drop=_name_ where=(not missing(ID))) delim=_;
by id measure;
id _name_ col1;
var _freq_;
run;
proc print;
run;
It is called design matrix in statistical theory.
@Rick_SAS wrote many blogs about it.
https://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
data example;
input ID $ measure Item1 $ Item2 $ Item3 $;
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;
proc glmselect data=example noprint outdesign(addinputvars)=want;
class Item: ;
model measure=Item: /selection=none noint;
run;
In addition to the links that KSharp provided, please see
The best way to generate dummy variables in SAS - The DO Loop
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.