Hi,
is there an easy, preferably automated way to create N variables with values 0 and 1 from a string variable that contains N different values? Ideally the code should work without any knowledge about N or the different values of the original variable.
Example:
Have:
COLOR |
Blue |
Red |
Blue |
Green |
Green |
Yellow |
Green |
Yellow |
Want:
COLOR_BLUE | COLOR_RED | COLOR_GREEN | COLOR_YELLOW |
1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 |
1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 |
0 | 0 | 1 | 0 |
0 | 0 | 0 | 1 |
0 | 0 | 1 | 0 |
0 | 0 | 0 | 1 |
What is the purpose of this, what further analysis do you want to do with the wide structure?
I think this will work for you.
data color;
id = _n_;
input color :$12.;
cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;;;;
run;
proc print;
run;
proc transreg data=color;
model class(color/zero='1');
id id color;
output out=design(drop=intercept) design;
run;
proc print;
run;
PROC GLMSELECT can do that. See this blog post.
You need a numeric variable to include in your MODEL statement. If you don't have one you can just use a SUM statement to make one.
data have;
row+1;
input color $10.;
cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;
Now let's make the new dataset.
proc glmselect data=have NOPRINT outdesign(addinputvars)=Want;
class color; /* list the categorical variables here */
model row = color / noint selection=none;
run;
Results
color_ color_ color_ color_ Obs Blue Green Red Yellow row color 1 1 0 0 0 1 Blue 2 0 0 1 0 2 Red 3 1 0 0 0 3 Blue 4 0 1 0 0 4 Green 5 0 1 0 0 5 Green 6 0 0 0 1 6 Yellow 7 0 1 0 0 7 Green 8 0 0 0 1 8 Yellow
Just for fun -- this assumes that:
1) the total length of your input string variable does not exceed SAS limits (~32K-ish characters?).
2) the individual values within this string (e.g., 'green', 'red',...) can themselves be legal variable names and that case is consistent
%macro maxlen(vals);
%global maxlen unqvals;
%let vals=%cmpres(&vals);
%let nvals=%sysfunc(countW(&vals, ' '));
%let maxlen=0;
%let unqvals=;
%do i=1 %to &nvals;
%let val=%scan(&vals,&i,' ');
%let maxlen=%sysfunc(max(&maxlen, %length(&val)));
%if %sysfunc(prxmatch(/\b&val\b/i, &unqvals))=0 %then %let unqvals=&unqvals &val;;
%end;
%mend; *maxlen();
%let vals=
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;
%maxlen(&vals);
%put &=maxlen;
%put &=unqvals;
data want;
vals=compbl("&vals");
length val $&maxlen &unqvals 3;
array u {*} &unqvals;
do i=1 to &nvals;
val=scan(vals,i,' ');
do j=1 to dim(u);
u[j]=(val=vname(u[j]));
end;
output;
end;
keep val &unqvals;
run;
proc print data=want; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.