Hi,
is there an easy, preferably automated way to create N variables with values 0 and 1 from a string variable that contains N different values? Ideally the code should work without any knowledge about N or the different values of the original variable.
Example:
Have:
| COLOR |
| Blue |
| Red |
| Blue |
| Green |
| Green |
| Yellow |
| Green |
| Yellow |
Want:
| COLOR_BLUE | COLOR_RED | COLOR_GREEN | COLOR_YELLOW |
| 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 |
What is the purpose of this, what further analysis do you want to do with the wide structure?
I think this will work for you.
data color;
id = _n_;
input color :$12.;
cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;;;;
run;
proc print;
run;
proc transreg data=color;
model class(color/zero='1');
id id color;
output out=design(drop=intercept) design;
run;
proc print;
run;
PROC GLMSELECT can do that. See this blog post.
You need a numeric variable to include in your MODEL statement. If you don't have one you can just use a SUM statement to make one.
data have;
row+1;
input color $10.;
cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;
Now let's make the new dataset.
proc glmselect data=have NOPRINT outdesign(addinputvars)=Want;
class color; /* list the categorical variables here */
model row = color / noint selection=none;
run;
Results
color_ color_ color_ color_ Obs Blue Green Red Yellow row color 1 1 0 0 0 1 Blue 2 0 0 1 0 2 Red 3 1 0 0 0 3 Blue 4 0 1 0 0 4 Green 5 0 1 0 0 5 Green 6 0 0 0 1 6 Yellow 7 0 1 0 0 7 Green 8 0 0 0 1 8 Yellow
Just for fun -- this assumes that:
1) the total length of your input string variable does not exceed SAS limits (~32K-ish characters?).
2) the individual values within this string (e.g., 'green', 'red',...) can themselves be legal variable names and that case is consistent
%macro maxlen(vals);
%global maxlen unqvals;
%let vals=%cmpres(&vals);
%let nvals=%sysfunc(countW(&vals, ' '));
%let maxlen=0;
%let unqvals=;
%do i=1 %to &nvals;
%let val=%scan(&vals,&i,' ');
%let maxlen=%sysfunc(max(&maxlen, %length(&val)));
%if %sysfunc(prxmatch(/\b&val\b/i, &unqvals))=0 %then %let unqvals=&unqvals &val;;
%end;
%mend; *maxlen();
%let vals=
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;
%maxlen(&vals);
%put &=maxlen;
%put &=unqvals;
data want;
vals=compbl("&vals");
length val $&maxlen &unqvals 3;
array u {*} &unqvals;
do i=1 to &nvals;
val=scan(vals,i,' ');
do j=1 to dim(u);
u[j]=(val=vname(u[j]));
end;
output;
end;
keep val &unqvals;
run;
proc print data=want; run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.