DATA Step, Macro, Functions and more

Binary Flags

Accepted Solution Solved
Reply
Regular Contributor
Posts: 181
Accepted Solution

Binary Flags

Hi Team,

 

I have data in the following form -

var1
A
A
C
D
C
D
A
A
B
C
D
E
E
A
B
E

 

I want the above data in the following form -

 

A B C D E
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 1 0 0
0 0 0 1 0
1 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0
0 0 0 0

1

 

I want to create a binary flag for each of the distinct values of a variable and create variables accordingly. 


Accepted Solutions
Solution
‎12-15-2016 01:13 PM
Super User
Posts: 10,543

Re: Binary Flags

Here's one way:

data want;
   set have;
   array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E');
   array v {5} A B C D E (5*0);
   v[whichc(Var1,of t{*})] = 1;
   output;
   v[whichc(Var1,of t{*})] = 0;

   drop var1;
run;

View solution in original post


All Replies
Solution
‎12-15-2016 01:13 PM
Super User
Posts: 10,543

Re: Binary Flags

Here's one way:

data want;
   set have;
   array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E');
   array v {5} A B C D E (5*0);
   v[whichc(Var1,of t{*})] = 1;
   output;
   v[whichc(Var1,of t{*})] = 0;

   drop var1;
run;
Regular Contributor
Posts: 181

Re: Binary Flags

Thanks a ton. What an elegant solution! Would you mind explaining how it works? I didn't get what second WHICHC is doing.

Super User
Posts: 10,543

Re: Binary Flags

I would suggest that you run the code once without the output statement or the second whichc to see what the result looks like.

 

This is an oddity of assigning values in the array initialization process. If you search the documentation on the Array statement you may find a statement similar to:

When any (or all) elements are assigned initial values, all elements behave as if they were named on a RETAIN statement.

 

Retain means that the values are kept from interation of the datastep to the next. With out the output to write the desired values at a given time and the resetting of the value with the second whichc eventually the program as written yields all 1's for the values. Probably not needed to save clock cycles. An Explicit loop over every value of the array with the flags could avoid the confusion but the as coded approach executes fewer "if" comparison that the loop would require.

Super User
Super User
Posts: 7,430

Re: Binary Flags

And what is the logic for starting a new row?  In your test data you have A, A, but these appear on separate rows.  Does this mean if the next observation is same or earlier alphabetically then it goes to a new row?

So, this code assigns an id based on letter being lower or equal to previous, so we have a logical operator to transpose the data up.

data have;
  input var1 $;
datalines;
A
A
C
D
C
D
A
A
;
run;

data inter;
  set have;
  retain id;
  if _n_=1 then id=1;
  if var1 <= lag(var1) then id=id+1;
  pres=1;
run;

proc transpose data=inter out=want;
  by id;
  var pres;
  id var1;
  idlabel var1;
run;

 

To be honest though, I don't see any real benefit in the second type of output, your just creating a load more cells with nothing?

Super User
Posts: 9,691

Re: Binary Flags

It is called design matrix. Here are two simple way.

1)
data have;
input var1 $;
cards;
A
A
C
D
C
D
A
A
B
C
D
E
E
A
B
E
;
run;
proc iml;
use have;
read all var {var1};
close;
vname=unique(var1);
want=design(var1);
print want[c=vname];
quit;

2)
data have;
 set have;
 retain y 1;
run;
proc logistic data=have outdesign=want(keep=var:) outdesignonly noprint;
 class var1/param=glm;
 model y=var1;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 274 views
  • 3 likes
  • 4 in conversation