BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ujjawal
Quartz | Level 8

Hi Team,

 

I have data in the following form -

var1
A
A
C
D
C
D
A
A
B
C
D
E
E
A
B
E

 

I want the above data in the following form -

 

A B C D E
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 1 0 0
0 0 0 1 0
1 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 0 0

1

 

I want to create a binary flag for each of the distinct values of a variable and create variables accordingly. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here's one way:

data want;
   set have;
   array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E');
   array v {5} A B C D E (5*0);
   v[whichc(Var1,of t{*})] = 1;
   output;
   v[whichc(Var1,of t{*})] = 0;

   drop var1;
run;

View solution in original post

5 REPLIES 5
ballardw
Super User

Here's one way:

data want;
   set have;
   array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E');
   array v {5} A B C D E (5*0);
   v[whichc(Var1,of t{*})] = 1;
   output;
   v[whichc(Var1,of t{*})] = 0;

   drop var1;
run;
Ujjawal
Quartz | Level 8

Thanks a ton. What an elegant solution! Would you mind explaining how it works? I didn't get what second WHICHC is doing.

ballardw
Super User

I would suggest that you run the code once without the output statement or the second whichc to see what the result looks like.

 

This is an oddity of assigning values in the array initialization process. If you search the documentation on the Array statement you may find a statement similar to:

When any (or all) elements are assigned initial values, all elements behave as if they were named on a RETAIN statement.

 

Retain means that the values are kept from interation of the datastep to the next. With out the output to write the desired values at a given time and the resetting of the value with the second whichc eventually the program as written yields all 1's for the values. Probably not needed to save clock cycles. An Explicit loop over every value of the array with the flags could avoid the confusion but the as coded approach executes fewer "if" comparison that the loop would require.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

And what is the logic for starting a new row?  In your test data you have A, A, but these appear on separate rows.  Does this mean if the next observation is same or earlier alphabetically then it goes to a new row?

So, this code assigns an id based on letter being lower or equal to previous, so we have a logical operator to transpose the data up.

data have;
  input var1 $;
datalines;
A
A
C
D
C
D
A
A
;
run;

data inter;
  set have;
  retain id;
  if _n_=1 then id=1;
  if var1 <= lag(var1) then id=id+1;
  pres=1;
run;

proc transpose data=inter out=want;
  by id;
  var pres;
  id var1;
  idlabel var1;
run;

 

To be honest though, I don't see any real benefit in the second type of output, your just creating a load more cells with nothing?

Ksharp
Super User
It is called design matrix. Here are two simple way.

1)
data have;
input var1 $;
cards;
A
A
C
D
C
D
A
A
B
C
D
E
E
A
B
E
;
run;
proc iml;
use have;
read all var {var1};
close;
vname=unique(var1);
want=design(var1);
print want[c=vname];
quit;

2)
data have;
 set have;
 retain y 1;
run;
proc logistic data=have outdesign=want(keep=var:) outdesignonly noprint;
 class var1/param=glm;
 model y=var1;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3041 views
  • 3 likes
  • 4 in conversation