BookmarkSubscribeRSS Feed
Lanox
Calcite | Level 5

Hi,

 

is there an easy, preferably automated way to create N variables with values 0 and 1 from a string variable that contains N different values? Ideally the code should work without any knowledge about N or the different values of the original variable.

 

Example:

Have:

COLOR
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow

 

Want:

COLOR_BLUECOLOR_REDCOLOR_GREENCOLOR_YELLOW
1000
0100
1000
0010
0010
0001
0010
0001
4 REPLIES 4
data_null__
Jade | Level 19

I think this will work for you.

 

data color;
   id = _n_;
   input color :$12.;
   cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;;;;
   run;
proc print;
   run;
proc transreg data=color;
   model class(color/zero='1');
   id id color;
   output out=design(drop=intercept) design;
   run;
proc print; 
   run;

Capture.PNG 

Tom
Super User Tom
Super User

PROC GLMSELECT can do that.  See this blog post

 

You need a numeric variable to include in your MODEL statement.  If you don't have one you can just use a SUM statement to make one.

 

data have;
  row+1;
  input color $10.;
cards;
Blue
Red
Blue
Green
Green
Yellow
Green
Yellow
;

Now let's make the new dataset.

 

 

proc glmselect data=have NOPRINT outdesign(addinputvars)=Want;
   class  color;   /* list the categorical variables here */
   model row = color /  noint selection=none;
run;

Results

 

         color_    color_    color_    color_
  Obs     Blue      Green      Red     Yellow    row    color

   1        1         0         0         0       1     Blue
   2        0         0         1         0       2     Red
   3        1         0         0         0       3     Blue
   4        0         1         0         0       4     Green
   5        0         1         0         0       5     Green
   6        0         0         0         1       6     Yellow
   7        0         1         0         0       7     Green
   8        0         0         0         1       8     Yellow
quickbluefish
Barite | Level 11

Just for fun -- this assumes that:

1) the total length of your input string variable does not exceed SAS limits (~32K-ish characters?).

2) the individual values within this string (e.g., 'green', 'red',...) can themselves be legal variable names and that case is consistent

 

%macro maxlen(vals);
	
	%global maxlen unqvals;
	
	%let vals=%cmpres(&vals);
	%let nvals=%sysfunc(countW(&vals, ' '));
	%let maxlen=0;
	%let unqvals=;
	%do i=1 %to &nvals;
		%let val=%scan(&vals,&i,' ');
		%let maxlen=%sysfunc(max(&maxlen, %length(&val)));
		%if %sysfunc(prxmatch(/\b&val\b/i, &unqvals))=0 %then %let unqvals=&unqvals &val;;
	%end;
	
%mend; *maxlen();

%let vals=
	Blue
	Red
	Blue
	Green
	Green
	Yellow
	Green
	Yellow
	;

%maxlen(&vals);

%put &=maxlen;
%put &=unqvals;

data want;
vals=compbl("&vals");
length val $&maxlen &unqvals 3;
array u {*} &unqvals;
do i=1 to &nvals;
	val=scan(vals,i,' ');
	do j=1 to dim(u);
		u[j]=(val=vname(u[j]));
	end;
	output;
end;
keep val &unqvals;
run;

proc print data=want; run;

quickbluefish_0-1760547955844.png

 

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 373 views
  • 5 likes
  • 5 in conversation