BookmarkSubscribeRSS Feed
Lanox
Calcite | Level 5

Hi,

 

is there an easy, preferably automated way to create N variables with values 0 and 1 from a string variable that contains N different values? Ideally the code should work without any knowledge about N or the different values of the original variable.

 

Example:

Have:

COLOR
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow

 

Want:

COLOR_BLUECOLOR_REDCOLOR_GREENCOLOR_YELLOW
1000
0100
1000
0010
0010
0001
0010
0001
4 REPLIES 4
data_null__
Jade | Level 19

I think this will work for you.

 

data color;
   id = _n_;
   input color :$12.;
   cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;;;;
   run;
proc print;
   run;
proc transreg data=color;
   model class(color/zero='1');
   id id color;
   output out=design(drop=intercept) design;
   run;
proc print; 
   run;

Capture.PNG 

Tom
Super User Tom
Super User

PROC GLMSELECT can do that.  See this blog post

 

You need a numeric variable to include in your MODEL statement.  If you don't have one you can just use a SUM statement to make one.

 

data have;
  row+1;
  input color $10.;
cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;

Now let's make the new dataset.

 

 

proc glmselect data=have NOPRINT outdesign(addinputvars)=Want;
   class  color;   /* list the categorical variables here */
   model row = color /  noint selection=none;
run;

Results

 

         color_    color_    color_    color_
  Obs     Blue      Green      Red     Yellow    row    color

   1        1         0         0         0       1     Blue
   2        0         0         1         0       2     Red
   3        1         0         0         0       3     Blue
   4        0         1         0         0       4     Green
   5        0         1         0         0       5     Green
   6        0         0         0         1       6     Yellow
   7        0         1         0         0       7     Green
   8        0         0         0         1       8     Yellow
quickbluefish
Barite | Level 11

Just for fun -- this assumes that:

1) the total length of your input string variable does not exceed SAS limits (~32K-ish characters?).

2) the individual values within this string (e.g., 'green', 'red',...) can themselves be legal variable names and that case is consistent

 

%macro maxlen(vals);
	
	%global maxlen unqvals;
	
	%let vals=%cmpres(&vals);
	%let nvals=%sysfunc(countW(&vals, ' '));
	%let maxlen=0;
	%let unqvals=;
	%do i=1 %to &nvals;
		%let val=%scan(&vals,&i,' ');
		%let maxlen=%sysfunc(max(&maxlen, %length(&val)));
		%if %sysfunc(prxmatch(/\b&val\b/i, &unqvals))=0 %then %let unqvals=&unqvals &val;;
	%end;
	
%mend; *maxlen();

%let vals=
	Blue
	Red
	Blue
	Green
	Green
	Yellow
	Green
	Yellow
	;

%maxlen(&vals);

%put &=maxlen;
%put &=unqvals;

data want;
vals=compbl("&vals");
length val $&maxlen &unqvals 3;
array u {*} &unqvals;
do i=1 to &nvals;
	val=scan(vals,i,' ');
	do j=1 to dim(u);
		u[j]=(val=vname(u[j]));
	end;
	output;
end;
keep val &unqvals;
run;

proc print data=want; run;

quickbluefish_0-1760547955844.png

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 228 views
  • 4 likes
  • 5 in conversation