SAS Programming

Ujjawal · Posted 12-15-2016 12:05 PM

Hi Team,

I have data in the following form -

var1

A

C

D

C

D

A

B

C

D

E

A

B

E

I want the above data in the following form -

A	B	C	D	E
1	0	0	0	0
1	0	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	1	0	0
0	0	0	1	0
1	0	0	0	0
1	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	0	0	1
0	0	0	0	1
1	0	0	0	0
0	1	0	0	0
0	0	0	0	1

I want to create a binary flag for each of the distinct values of a variable and create variables accordingly.

ballardw · Posted 12-15-2016 12:20 PM

Here's one way:

data want;
   set have;
   array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E');
   array v {5} A B C D E (5*0);
   v[whichc(Var1,of t{*})] = 1;
   output;
   v[whichc(Var1,of t{*})] = 0;

   drop var1;
run;

View solution in original post

ballardw · Posted 12-15-2016 12:20 PM

Here's one way:

data want;
   set have;
   array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E');
   array v {5} A B C D E (5*0);
   v[whichc(Var1,of t{*})] = 1;
   output;
   v[whichc(Var1,of t{*})] = 0;

   drop var1;
run;

Ujjawal · Posted 12-15-2016 01:15 PM

Thanks a ton. What an elegant solution! Would you mind explaining how it works? I didn't get what second WHICHC is doing.

ballardw · Posted 12-15-2016 03:45 PM

I would suggest that you run the code once without the output statement or the second whichc to see what the result looks like.

This is an oddity of assigning values in the array initialization process. If you search the documentation on the Array statement you may find a statement similar to:

When any (or all) elements are assigned initial values, all elements behave as if they were named on a RETAIN statement.

Retain means that the values are kept from interation of the datastep to the next. With out the output to write the desired values at a given time and the resetting of the value with the second whichc eventually the program as written yields all 1's for the values. Probably not needed to save clock cycles. An Explicit loop over every value of the array with the flags could avoid the confusion but the as coded approach executes fewer "if" comparison that the loop would require.

RW9 · Posted 12-15-2016 12:22 PM

And what is the logic for starting a new row? In your test data you have A, A, but these appear on separate rows. Does this mean if the next observation is same or earlier alphabetically then it goes to a new row?

So, this code assigns an id based on letter being lower or equal to previous, so we have a logical operator to transpose the data up.

data have;
  input var1 $;
datalines;
A
A
C
D
C
D
A
A
;
run;

data inter;
  set have;
  retain id;
  if _n_=1 then id=1;
  if var1 <= lag(var1) then id=id+1;
  pres=1;
run;

proc transpose data=inter out=want;
  by id;
  var pres;
  id var1;
  idlabel var1;
run;

To be honest though, I don't see any real benefit in the second type of output, your just creating a load more cells with nothing?

Ksharp · Posted 12-15-2016 11:15 PM

It is called design matrix. Here are two simple way.

1)
data have;
input var1 $;
cards;
A
A
C
D
C
D
A
A
B
C
D
E
E
A
B
E
;
run;
proc iml;
use have;
read all var {var1};
close;
vname=unique(var1);
want=design(var1);
print want[c=vname];
quit;

2)
data have;
 set have;
 retain y 1;
run;
proc logistic data=have outdesign=want(keep=var:) outdesignonly noprint;
 class var1/param=glm;
 model y=var1;
run;

SAS Programming

Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Follow Us

What is...

SAS Programming

Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...