Hi.
i have the following long formatted data (2 variables: ID and var (with more than 32 bytes))
id var
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
etc.
i want to create a dummy variable for each unique value of variable “var” (name of the dummy variables should be col1, col2 etc. and labels should come from the values of var). Can anyone help me?
thank you.
Hi Zula,
Assuming there is no duplicate observation exists for VAR variable.
data test;
infile datalines dsd dlm='';
input id var $50.;
datalines;
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
;
run;
proc transpose data=test out=want (drop=_name_);
id var;
idlabel var;
var id;
by id notsorted;
run;
Please show what you expect as output based on that input and what you've tried so far. Hint: PROC TRANSPOSE should be your starting point and look at ID and IDLABEL.
@Zula wrote:
Hi.
i have the following long formatted data (2 variables: ID and var (with more than 32 bytes))
id var
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
etc.
i want to create a dummy variable for each unique value of variable “var” (name of the dummy variables should be col1, col2 etc. and labels should come from the values of var). Can anyone help me?
thank you.
The output should be as below (wide formatted):
ID col1 (labeled as my name is mike) col2 (labeled as 8 z street) col3 (labeled unit 5) col4 (labeled my name is z)etc.
1 1 1 1 0
2 0 0 0 1
thank you.
try doing a proc freq then a proc transpose and you should get what you are asking for.
Wow what a possibility for the n columns you will create.
Hi Zula,
Assuming there is no duplicate observation exists for VAR variable.
data test;
infile datalines dsd dlm='';
input id var $50.;
datalines;
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
;
run;
proc transpose data=test out=want (drop=_name_);
id var;
idlabel var;
var id;
by id notsorted;
run;
@singhsahab This is a bit closer to what @Zula wants.
data test;
input id var $50.;
datalines;
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
;;;;
run;
proc print;
run;
proc glmmod noprint data=test outdesign=want0 order=data;
class var;
model id = var / noint;
run;
proc summary data=want0 nway;
class id;
output out=want(drop=_type_ _freq_) sum(col:)=;
run;
proc contents varnum data=want;
proc print data=want;
run;
@data_null__ Good morning Guru, Thank you. However, while a somebody like me who is not even close to your level of expertise, could I bother you for some comments as to what it does. Sure, I am dig in with docs and follow the logic/results, I will eventually get there. Just asking plz
@novinosrin wrote:
@data_null__ Good morning Guru, Thank you. However, while a somebody like me who is not even close to your level of expertise, could I bother you for some comments as to what it does. Sure, I am dig in with docs and follow the logic/results, I will eventually get there. Just asking plz
@novinosrin this example is bit more general and make the variable labels with out the leading VAR. Just the value of VAR.
PROC GLMMOD is one of the ways that we can use to create dummy variables. In this case for each value of VAR a new dummy variable is created. Much easier than data step.
I've added the variable _OBS_ to the data set so we can merge the dummies back to the original data and serve as the Y variable in the model.
I use the OUTPARM data from GLMMOD to create a new data set call LABELS that will supply new labels for the COL variable with out variable name VAR as the beginning.
879 data test; 880 _obs_ + 1; /*this will be Y left side analysis variable*/ 881 input id:$1. var $50.; 882 datalines; NOTE: The data set WORK.TEST has 9 observations and 3 variables. 892 ;;;; 893 run; 894 proc print; 895 run; Obs _obs_ id var 1 1 1 my name is mike 2 2 1 8 z street 3 3 1 unit 5 4 4 2 my name is z 5 5 2 78 8th street 6 6 2 city oktown - old town of the fairf county 7 7 3 8 z street 8 8 3 unit 5 9 9 3 my name is z 896 /*Create dummy variables for each level of the variable VAR */ 897 /*The default name of the new variables is COLn*/ 898 proc glmmod noprint data=test outdesign=want0 order=data outparm=parm; 899 class var; 900 model _obs_ = var / noint; /*Don't need the intercept*/ 901 run; NOTE: There were 9 observations read from the data set WORK.TEST. NOTE: The data set WORK.WANT0 has 9 observations and 7 variables. NOTE: The data set WORK.PARM has 6 observations and 3 variables. 902 /*The default label for the COL variables is "effect name" value */ 903 /*I don't want the effect name, use the OUTPARM data to create new labels */ 904 proc print data=parm; 905 run; Obs _COLNUM_ EFFNAME var 1 1 var my name is mike 2 2 var 8 z street 3 3 var unit 5 4 4 var my name is z 5 5 var 78 8th street 6 6 var city oktown - old town of the fairf county 906 /*I guess this is a bit of a trick to create numeric variables and label them*/ 907 proc transpose data=parm out=labels(drop=_name_) prefix=COL; 908 id _colnum_; 909 idlabel var; 910 var; 911 run; NOTE: No variables to transpose. NOTE: There were 6 observations read from the data set WORK.PARM. NOTE: The data set WORK.LABELS has 0 observations and 6 variables. 912 proc contents varnum; 913 run; Variables in Creation Order # Variable Type Len Label 1 COL1 Num 8 my name is mike 2 COL2 Num 8 8 z street 3 COL3 Num 8 unit 5 4 COL4 Num 8 my name is z 5 COL5 Num 8 78 8th street 6 COL6 Num 8 city oktown - old town of the fairf county 914 915 /*Merge the dummies with the input data*/ 916 data want; 917 if 0 then set test labels; /*set order of variables and get new labels for the COLn variables*/ 918 merge test want0; 919 by _obs_; 920 run; NOTE: There were 9 observations read from the data set WORK.TEST. NOTE: There were 9 observations read from the data set WORK.WANT0. NOTE: The data set WORK.WANT has 9 observations and 9 variables. 921 proc contents varnum data=want; Variables in Creation Order # Variable Type Len Label 1 _obs_ Num 8 2 id Char 1 3 var Char 50 4 COL1 Num 8 my name is mike 5 COL2 Num 8 8 z street 6 COL3 Num 8 unit 5 7 COL4 Num 8 my name is z 8 COL5 Num 8 78 8th street 9 COL6 Num 8 city oktown - old town of the fairf county 922 proc print data=want; 923 run; Obs _obs_ id var COL1 COL2 COL3 COL4 COL5 COL6 1 1 1 my name is mike 1 0 0 0 0 0 2 2 1 8 z street 0 1 0 0 0 0 3 3 1 unit 5 0 0 1 0 0 0 4 4 2 my name is z 0 0 0 1 0 0 5 5 2 78 8th street 0 0 0 0 1 0 6 6 2 city oktown - old town of the fairf county 0 0 0 0 0 1 7 7 3 8 z street 0 1 0 0 0 0 8 8 3 unit 5 0 0 1 0 0 0 9 9 3 my name is z 0 0 0 1 0 0
Class act and Thank you sooooooooooooo much! Can't appreciate enough!!!!!!!!!!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.