BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zula
Obsidian | Level 7

Hi.

i have the following long formatted data (2 variables: ID and var (with more than 32 bytes))

id var

1 my name is mike

1 8 z street

1 unit 5

2 my name is z

2 78 8th street

2 city oktown - old town of the fairf county

etc.

 

i want to create a dummy variable for each unique value of variable “var” (name of the dummy variables should be col1, col2 etc. and labels should come from the values of var). Can anyone help me?

thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
singhsahab
Lapis Lazuli | Level 10

Hi Zula,

 

Assuming there is no duplicate observation exists for VAR variable. 

 

data test;
infile datalines dsd dlm='';
input id var $50.;
datalines;
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
;
run;



proc transpose data=test out=want (drop=_name_);
id var;
idlabel var;
var id;
by id notsorted;
run;

View solution in original post

9 REPLIES 9
Reeza
Super User

Please show what you expect as output based on that input and what you've tried so far. Hint: PROC TRANSPOSE should be your starting point and look at ID and IDLABEL. 

 


@Zula wrote:

Hi.

i have the following long formatted data (2 variables: ID and var (with more than 32 bytes))

id var

1 my name is mike

1 8 z street

1 unit 5

2 my name is z

2 78 8th street

2 city oktown - old town of the fairf county

etc.

 

i want to create a dummy variable for each unique value of variable “var” (name of the dummy variables should be col1, col2 etc. and labels should come from the values of var). Can anyone help me?

thank you.

 


 

Zula
Obsidian | Level 7

The output should be as below (wide formatted):

 

ID col1 (labeled as my name is mike) col2 (labeled as 8 z street) col3 (labeled unit 5) col4 (labeled my name is z)etc.

1 1 1 1 0

2 0 0 0 1

 

thank you.

 

 

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

try doing a proc freq then a proc transpose and you should get what you are asking for.

Wow what a possibility for the n columns you will create.

 

Zula
Obsidian | Level 7
Sorry that does not help
singhsahab
Lapis Lazuli | Level 10

Hi Zula,

 

Assuming there is no duplicate observation exists for VAR variable. 

 

data test;
infile datalines dsd dlm='';
input id var $50.;
datalines;
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
;
run;



proc transpose data=test out=want (drop=_name_);
id var;
idlabel var;
var id;
by id notsorted;
run;
data_null__
Jade | Level 19

@singhsahab This is a bit closer to what @Zula wants.

 

data test;
   input id var $50.;
   datalines;
1 my name is mike
1 8 z street
1 unit 5
2 my name is z
2 78 8th street
2 city oktown - old town of the fairf county
;;;;
   run;
proc print;
   run;
proc glmmod noprint data=test outdesign=want0 order=data;
   class var;
   model id = var / noint;
   run;
proc summary data=want0 nway;
   class id;
   output out=want(drop=_type_ _freq_) sum(col:)=;
   run;
proc contents varnum data=want;
proc print data=want;
   run;

 

Capture.PNG

novinosrin
Tourmaline | Level 20

@data_null__   Good morning Guru, Thank you. However, while a somebody like me who is  not even close to your level of expertise, could I bother you for some comments as to what it does. Sure, I am dig in with docs and follow the logic/results, I will eventually get there. Just asking plz

data_null__
Jade | Level 19

@novinosrin wrote:

@data_null__   Good morning Guru, Thank you. However, while a somebody like me who is  not even close to your level of expertise, could I bother you for some comments as to what it does. Sure, I am dig in with docs and follow the logic/results, I will eventually get there. Just asking plz


 

@novinosrin this example is bit more general and make the variable labels with out the leading VAR.  Just the value of VAR.

 

PROC GLMMOD is one of the ways that we can use to create dummy variables.  In this case for each value of VAR a new dummy variable is created.  Much easier than data step.

 

I've added the variable _OBS_ to the data set so we can merge the dummies back to the original data and serve as the Y variable in the model.

 

I use the OUTPARM data from GLMMOD to create a new data set call LABELS that will supply new labels for the COL variable with out variable name VAR as the beginning.

 

879  data test;
880     _obs_ + 1; /*this will be Y left side analysis variable*/
881     input id:$1. var $50.;
882     datalines;

NOTE: The data set WORK.TEST has 9 observations and 3 variables.


892  ;;;;
893     run;
894  proc print;
895     run;

Obs    _obs_    id    var

 1       1      1     my name is mike
 2       2      1     8 z street
 3       3      1     unit 5
 4       4      2     my name is z
 5       5      2     78 8th street
 6       6      2     city oktown - old town of the fairf county
 7       7      3     8 z street
 8       8      3     unit 5
 9       9      3     my name is z


896  /*Create dummy variables for each level of the variable VAR */
897  /*The default name of the new variables is COLn*/
898  proc glmmod noprint data=test outdesign=want0 order=data outparm=parm;
899     class var;
900     model _obs_ = var / noint; /*Don't need the intercept*/
901     run;

NOTE: There were 9 observations read from the data set WORK.TEST.
NOTE: The data set WORK.WANT0 has 9 observations and 7 variables.
NOTE: The data set WORK.PARM has 6 observations and 3 variables.


902  /*The default label for the COL variables is "effect name" value */
903  /*I don't want the effect name, use the OUTPARM data to create new labels */
904  proc print data=parm;
905     run;

Obs    _COLNUM_    EFFNAME    var

 1         1         var      my name is mike
 2         2         var      8 z street
 3         3         var      unit 5
 4         4         var      my name is z
 5         5         var      78 8th street
 6         6         var      city oktown - old town of the fairf county



906  /*I guess this is a bit of a trick to create numeric variables and label them*/
907  proc transpose data=parm out=labels(drop=_name_) prefix=COL;
908     id _colnum_;
909     idlabel var;
910     var;
911     run;

NOTE: No variables to transpose.
NOTE: There were 6 observations read from the data set WORK.PARM.
NOTE: The data set WORK.LABELS has 0 observations and 6 variables.


912  proc contents varnum;
913     run;


                        Variables in Creation Order

#    Variable    Type    Len    Label

1    COL1        Num       8    my name is mike
2    COL2        Num       8    8 z street
3    COL3        Num       8    unit 5
4    COL4        Num       8    my name is z
5    COL5        Num       8    78 8th street
6    COL6        Num       8    city oktown - old town of the fairf county




914
915  /*Merge the dummies with the input data*/
916  data want;
917     if 0 then set test labels; /*set order of variables and get new labels for the COLn variables*/
918     merge test want0;
919     by _obs_;
920     run;

NOTE: There were 9 observations read from the data set WORK.TEST.
NOTE: There were 9 observations read from the data set WORK.WANT0.
NOTE: The data set WORK.WANT has 9 observations and 9 variables.


921  proc contents varnum data=want;

                         Variables in Creation Order

 #    Variable    Type    Len    Label

 1    _obs_       Num       8
 2    id          Char      1
 3    var         Char     50
 4    COL1        Num       8    my name is mike
 5    COL2        Num       8    8 z street
 6    COL3        Num       8    unit 5
 7    COL4        Num       8    my name is z
 8    COL5        Num       8    78 8th street
 9    COL6        Num       8    city oktown - old town of the fairf county




922  proc print data=want;
923     run;

Obs    _obs_    id    var                                           COL1    COL2    COL3    COL4    COL5    COL6

 1       1      1     my name is mike                                 1       0       0       0       0       0
 2       2      1     8 z street                                      0       1       0       0       0       0
 3       3      1     unit 5                                          0       0       1       0       0       0
 4       4      2     my name is z                                    0       0       0       1       0       0
 5       5      2     78 8th street                                   0       0       0       0       1       0
 6       6      2     city oktown - old town of the fairf county      0       0       0       0       0       1
 7       7      3     8 z street                                      0       1       0       0       0       0
 8       8      3     unit 5                                          0       0       1       0       0       0
 9       9      3     my name is z                                    0       0       0       1       0       0





novinosrin
Tourmaline | Level 20

Class act and Thank you sooooooooooooo much! Can't appreciate enough!!!!!!!!!!!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2241 views
  • 10 likes
  • 6 in conversation