BookmarkSubscribeRSS Feed
SmcGarrett
Obsidian | Level 7

Hello, 

 

I am building a model which will be used to analyze changing datasets. 

I need to make an indicator or dummy variable for each value that appears in a list of a categorical variable.

 

Example

 

Cat1.JPG

 

And I need it to be:

Cat2..JPG

 

I absolutely cannot do anytype of hardcoding because the values of my Categorical variable will change constantly as new datasets are analyzed. I need a quick and reproducible code that can duplicate the process when the categorical variable takes new values.

 

All help greatly appreciated. 

9 REPLIES 9
Reeza
Super User

https://communities.sas.com/t5/SAS-Statistical-Procedures/How-to-create-dummy-variables-Categorical-...

 

I wrote a post on this recently that may be helpful. I would use a code node in EM to do this.

Do you really need to do this in EM though? If it's a categorical variable that's usually how SAS treat's it, and you can specify the parameterization type in some procedures. 

Ie if your proc allows for a CLASS variable then it will be dummy coded by default, though it may not be the method you want. 

 

I'm assuming EM since you've posted in data mining. If you're using Base SAS this would still be applicable though. 

Reeza
Super User

https://communities.sas.com/t5/SAS-Statistical-Procedures/How-to-create-dummy-variables-Categorical-...

 

I wrote a post on this recently that may be helpful. I would use a code node in EM to do this.

Do you really need to do this in EM though? If it's a categorical variable that's usually how SAS treat's it, and you can specify the parameterization type in some procedures. 

Ie if your proc allows for a CLASS variable then it will be dummy coded by default, though it may not be the method you want. 

 

I'm assuming EM since you've posted in data mining. If you're using Base SAS this would still be applicable though. 

SmcGarrett
Obsidian | Level 7

Hi Reeza, 

 

Thank you for the response. I'm still trying to make it work. This isn't part of my modeling process...but part of my data-set building/ cleaning process and yes, I do need a dummy variable for each individual occurance of the categorical variable. 

 

I'm having trouble getting this to work though. 

Haikuo
Onyx | Level 15

Though I knew little to zero about IML, I however still think IML can have a neat solution for this. Following is my quick and dirty way:

OPTIONS VALIDVARNAME=ANY;

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;

PROC TRANSPOSE DATA=HAVE OUT=HAVE2(DROP=_NAME_);
	ID VAR;
	COPY ID VAR;
RUN;

DATA WANT;
	SET HAVE2;
	ARRAY V 'ONE_1'N--'THREE_3'N;

	DO OVER V;
		V=IFN(VAR=VNAME(V),1,0);
	END;
RUN;

 

 

Updated: To cope non-V7 compliance variable names. 

SmcGarrett
Obsidian | Level 7

Hi Haikuo,

 

Thanks for the resposne. 

 

I don't know if this is exactly what I want to do. I need a code that will read the values of the categorical variable and then create a new variable with the name of the categorical variable. Since the categorical variable will be changing from one data set to the next, the values will be changing and thus, this "Hard coding" will fail to create the name and variable since a value that exists in one dataset will most definitely be different than the value in the next. 

Haikuo
Onyx | Level 15

The main issue of this dummy variable making is the transpose, to make it dynamic is rather trivial, is this update what you need?

 

OPTIONS VALIDVARNAME=ANY;

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;

PROC SQL NOPRINT ;
SELECT VAR INTO :VAR SEPARATED BY ' ' FROM HAVE;
QUIT;

PROC TRANSPOSE DATA=HAVE OUT=HAVE2(DROP=_NAME_);
	ID VAR;
	COPY ID VAR;
RUN;

DATA WANT;
	SET HAVE2;
	ARRAY V &VAR.;

	DO OVER V;
		V=IFN(VAR=VNAME(V),1,0);
	END;
RUN;

 
 
SmcGarrett
Obsidian | Level 7

Haikuo,

 

You are a genius! I will now be able to sleep tonight. 

 

Thank you so incredible much! You have no idea how helpful that was! 

 

 

Ksharp
Super User
You are trying to get Design Matrix . 
Check this blog:

http://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
run;
data Temp ;
   set have;
   FakeY = 0;
run;
 
proc logistic data=Temp outdesign=EffectDesign(drop=FakeY) outdesignonly;
   class var / param=glm;
   model FakeY = var;
run;
 
proc print data=EffectDesign; run;


Ksharp
Super User

And It is easy for IML.

 

 

DATA HAVE;
	INPUT ID VAR $;
	CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
run;
proc iml;
use have;
read all var{var} ;
close;
col=unique(var);
x=design(var);
create want from x[r=var c=col];
append from x[r=var];
close;
run;

proc print;run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2169 views
  • 5 likes
  • 4 in conversation