Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Create New Variables based on Categorical Variable Value

Reply
Contributor
Posts: 32

Create New Variables based on Categorical Variable Value

Hello, 

 

I am building a model which will be used to analyze changing datasets. 

I need to make an indicator or dummy variable for each value that appears in a list of a categorical variable.

 

Example

 

Cat1.JPG

 

And I need it to be:

Cat2..JPG

 

I absolutely cannot do anytype of hardcoding because the values of my Categorical variable will change constantly as new datasets are analyzed. I need a quick and reproducible code that can duplicate the process when the categorical variable takes new values.

 

All help greatly appreciated. 

Super User
Posts: 17,826

Re: Create New Variables based on Categorical Variable Value

https://communities.sas.com/t5/SAS-Statistical-Procedures/How-to-create-dummy-variables-Categorical-...

 

I wrote a post on this recently that may be helpful. I would use a code node in EM to do this.

Do you really need to do this in EM though? If it's a categorical variable that's usually how SAS treat's it, and you can specify the parameterization type in some procedures. 

Ie if your proc allows for a CLASS variable then it will be dummy coded by default, though it may not be the method you want. 

 

I'm assuming EM since you've posted in data mining. If you're using Base SAS this would still be applicable though. 

Super User
Posts: 17,826

Re: Create New Variables based on Categorical Variable Value

https://communities.sas.com/t5/SAS-Statistical-Procedures/How-to-create-dummy-variables-Categorical-...

 

I wrote a post on this recently that may be helpful. I would use a code node in EM to do this.

Do you really need to do this in EM though? If it's a categorical variable that's usually how SAS treat's it, and you can specify the parameterization type in some procedures. 

Ie if your proc allows for a CLASS variable then it will be dummy coded by default, though it may not be the method you want. 

 

I'm assuming EM since you've posted in data mining. If you're using Base SAS this would still be applicable though. 

Contributor
Posts: 32

Re: Create New Variables based on Categorical Variable Value

Hi Reeza, 

 

Thank you for the response. I'm still trying to make it work. This isn't part of my modeling process...but part of my data-set building/ cleaning process and yes, I do need a dummy variable for each individual occurance of the categorical variable. 

 

I'm having trouble getting this to work though. 

Respected Advisor
Posts: 3,124

Re: Create New Variables based on Categorical Variable Value

[ Edited ]

Though I knew little to zero about IML, I however still think IML can have a neat solution for this. Following is my quick and dirty way:

OPTIONS VALIDVARNAME=ANY;

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;

PROC TRANSPOSE DATA=HAVE OUT=HAVE2(DROP=_NAME_);
	ID VAR;
	COPY ID VAR;
RUN;

DATA WANT;
	SET HAVE2;
	ARRAY V 'ONE_1'N--'THREE_3'N;

	DO OVER V;
		V=IFN(VAR=VNAME(V),1,0);
	END;
RUN;

 

 

Updated: To cope non-V7 compliance variable names. 

Contributor
Posts: 32

Re: Create New Variables based on Categorical Variable Value

Hi Haikuo,

 

Thanks for the resposne. 

 

I don't know if this is exactly what I want to do. I need a code that will read the values of the categorical variable and then create a new variable with the name of the categorical variable. Since the categorical variable will be changing from one data set to the next, the values will be changing and thus, this "Hard coding" will fail to create the name and variable since a value that exists in one dataset will most definitely be different than the value in the next. 

Respected Advisor
Posts: 3,124

Re: Create New Variables based on Categorical Variable Value

The main issue of this dummy variable making is the transpose, to make it dynamic is rather trivial, is this update what you need?

 

OPTIONS VALIDVARNAME=ANY;

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;

PROC SQL NOPRINT ;
SELECT VAR INTO :VAR SEPARATED BY ' ' FROM HAVE;
QUIT;

PROC TRANSPOSE DATA=HAVE OUT=HAVE2(DROP=_NAME_);
	ID VAR;
	COPY ID VAR;
RUN;

DATA WANT;
	SET HAVE2;
	ARRAY V &VAR.;

	DO OVER V;
		V=IFN(VAR=VNAME(V),1,0);
	END;
RUN;

 
 
Contributor
Posts: 32

Re: Create New Variables based on Categorical Variable Value

Haikuo,

 

You are a genius! I will now be able to sleep tonight. 

 

Thank you so incredible much! You have no idea how helpful that was! 

 

 

Super User
Posts: 9,681

Re: Create New Variables based on Categorical Variable Value

You are trying to get Design Matrix . 
Check this blog:

http://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
run;
data Temp ;
   set have;
   FakeY = 0;
run;
 
proc logistic data=Temp outdesign=EffectDesign(drop=FakeY) outdesignonly;
   class var / param=glm;
   model FakeY = var;
run;
 
proc print data=EffectDesign; run;


Super User
Posts: 9,681

Re: Create New Variables based on Categorical Variable Value

And It is easy for IML.

 

 

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
run;
proc iml;
use have;
read all var{var} ;
close;
col=unique(var);
x=design(var);
create want from x[r=var c=col];
append from x[r=var];
close;
run;

proc print;run;
Ask a Question
Discussion stats
  • 9 replies
  • 559 views
  • 5 likes
  • 4 in conversation