Hi Team,
I have data in the following form -
var1 |
A |
A |
C |
D |
C |
D |
A |
A |
B |
C |
D |
E |
E |
A |
B |
E |
I want the above data in the following form -
A | B | C | D | E |
1 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 1 |
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
1 |
I want to create a binary flag for each of the distinct values of a variable and create variables accordingly.
Here's one way:
data want; set have; array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E'); array v {5} A B C D E (5*0); v[whichc(Var1,of t{*})] = 1; output; v[whichc(Var1,of t{*})] = 0; drop var1; run;
Here's one way:
data want; set have; array t {5} $ _temporary_ ('A' 'B' 'C' 'D' 'E'); array v {5} A B C D E (5*0); v[whichc(Var1,of t{*})] = 1; output; v[whichc(Var1,of t{*})] = 0; drop var1; run;
Thanks a ton. What an elegant solution! Would you mind explaining how it works? I didn't get what second WHICHC is doing.
I would suggest that you run the code once without the output statement or the second whichc to see what the result looks like.
This is an oddity of assigning values in the array initialization process. If you search the documentation on the Array statement you may find a statement similar to:
When any (or all) elements are assigned initial values, all elements behave as if they were named on a RETAIN statement.
Retain means that the values are kept from interation of the datastep to the next. With out the output to write the desired values at a given time and the resetting of the value with the second whichc eventually the program as written yields all 1's for the values. Probably not needed to save clock cycles. An Explicit loop over every value of the array with the flags could avoid the confusion but the as coded approach executes fewer "if" comparison that the loop would require.
And what is the logic for starting a new row? In your test data you have A, A, but these appear on separate rows. Does this mean if the next observation is same or earlier alphabetically then it goes to a new row?
So, this code assigns an id based on letter being lower or equal to previous, so we have a logical operator to transpose the data up.
data have; input var1 $; datalines; A A C D C D A A ; run; data inter; set have; retain id; if _n_=1 then id=1; if var1 <= lag(var1) then id=id+1; pres=1; run; proc transpose data=inter out=want; by id; var pres; id var1; idlabel var1; run;
To be honest though, I don't see any real benefit in the second type of output, your just creating a load more cells with nothing?
It is called design matrix. Here are two simple way. 1) data have; input var1 $; cards; A A C D C D A A B C D E E A B E ; run; proc iml; use have; read all var {var1}; close; vname=unique(var1); want=design(var1); print want[c=vname]; quit; 2) data have; set have; retain y 1; run; proc logistic data=have outdesign=want(keep=var:) outdesignonly noprint; class var1/param=glm; model y=var1; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.