Hello everyone,
I have a dataset for an exam but need to reformat them from long to wide with specific variable names and dichotomizing all options. I can recode the answers with if then statements but it gets too long for a long test (like over 150 items). How do I achieve this through array and do loop? Thank you very much in advance!
data example;
input ID $ measure Item1 $ Item2 $ Item3 $;
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;
data want;
input ID $ measure I1A I1B I1C I1D I2A I2B I2C I2D I3A I3B I3C I3D;
cards;
1 -1.26 0 1 0 0 0 0 0 1 1 0 0 0
2 -0.49 1 0 0 0 0 0 1 0 0 0 0 1
3 1.58 0 1 0 0 0 0 0 1 1 0 0 0
4 0.69 0 0 1 0 0 0 0 1 1 0 0 0
5 -0.63 1 0 0 0 0 0 1 0 0 1 0 0
;
Variable names basically use I (for item) and 1 as the first item, then the options A, B, C, D. For answer A, it becomes 1 0 0 0, answer B is 0 1 0 0, etc. for a particular item.
I like PROC TRANSPOSE for this as the variables name themselves.
data example;
input ID $ measure (Item1-Item3)($);
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;;;;
proc print;
run;
proc transpose data=example out=flip;
by id measure;
var item:;
run;
data flip;
set flip(drop=id measure) flip;
run;
proc summary data=flip nway missing completetypes classdata=flip;
by id measure;
class _name_ col1;
output out=flip2;
run;
*proc print;
run;
proc transpose data=flip2 out=flop(drop=_name_ where=(not missing(ID))) delim=_;
by id measure;
id _name_ col1;
var _freq_;
run;
proc print;
run;
I like PROC TRANSPOSE for this as the variables name themselves.
data example;
input ID $ measure (Item1-Item3)($);
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;;;;
proc print;
run;
proc transpose data=example out=flip;
by id measure;
var item:;
run;
data flip;
set flip(drop=id measure) flip;
run;
proc summary data=flip nway missing completetypes classdata=flip;
by id measure;
class _name_ col1;
output out=flip2;
run;
*proc print;
run;
proc transpose data=flip2 out=flop(drop=_name_ where=(not missing(ID))) delim=_;
by id measure;
id _name_ col1;
var _freq_;
run;
proc print;
run;
It is called design matrix in statistical theory.
@Rick_SAS wrote many blogs about it.
https://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
data example;
input ID $ measure Item1 $ Item2 $ Item3 $;
cards;
1 -1.26 B D A
2 -0.49 A C D
3 1.58 B D A
4 0.69 C D A
5 -0.63 A C B
;
proc glmselect data=example noprint outdesign(addinputvars)=want;
class Item: ;
model measure=Item: /selection=none noint;
run;
In addition to the links that KSharp provided, please see
The best way to generate dummy variables in SAS - The DO Loop
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: