- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
I have data in the following form -
var1 |
A |
A |
C |
D |
C |
D |
A |
A |
B |
C |
D |
E |
E |
A |
B |
E |
I want the above data in the following form -
A | B | C | D | E |
1 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 1 |
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
1 |
I want to create a binary flag for each of the distinct values of a variable and create variables accordingly.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's one way:
data want; set have; array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E'); array v {5} A B C D E (5*0); v[whichc(Var1,of t{*})] = 1; output; v[whichc(Var1,of t{*})] = 0; drop var1; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's one way:
data want; set have; array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E'); array v {5} A B C D E (5*0); v[whichc(Var1,of t{*})] = 1; output; v[whichc(Var1,of t{*})] = 0; drop var1; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a ton. What an elegant solution! Would you mind explaining how it works? I didn't get what second WHICHC is doing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest that you run the code once without the output statement or the second whichc to see what the result looks like.
This is an oddity of assigning values in the array initialization process. If you search the documentation on the Array statement you may find a statement similar to:
When any (or all) elements are assigned initial values, all elements behave as if they were named on a RETAIN statement.
Retain means that the values are kept from interation of the datastep to the next. With out the output to write the desired values at a given time and the resetting of the value with the second whichc eventually the program as written yields all 1's for the values. Probably not needed to save clock cycles. An Explicit loop over every value of the array with the flags could avoid the confusion but the as coded approach executes fewer "if" comparison that the loop would require.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And what is the logic for starting a new row? In your test data you have A, A, but these appear on separate rows. Does this mean if the next observation is same or earlier alphabetically then it goes to a new row?
So, this code assigns an id based on letter being lower or equal to previous, so we have a logical operator to transpose the data up.
data have; input var1 $; datalines; A A C D C D A A ; run; data inter; set have; retain id; if _n_=1 then id=1; if var1 <= lag(var1) then id=id+1; pres=1; run; proc transpose data=inter out=want; by id; var pres; id var1; idlabel var1; run;
To be honest though, I don't see any real benefit in the second type of output, your just creating a load more cells with nothing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is called design matrix. Here are two simple way. 1) data have; input var1 $; cards; A A C D C D A A B C D E E A B E ; run; proc iml; use have; read all var {var1}; close; vname=unique(var1); want=design(var1); print want[c=vname]; quit; 2) data have; set have; retain y 1; run; proc logistic data=have outdesign=want(keep=var:) outdesignonly noprint; class var1/param=glm; model y=var1; run;