Hello,
I am building a model which will be used to analyze changing datasets.
I need to make an indicator or dummy variable for each value that appears in a list of a categorical variable.
Example
And I need it to be:
I absolutely cannot do anytype of hardcoding because the values of my Categorical variable will change constantly as new datasets are analyzed. I need a quick and reproducible code that can duplicate the process when the categorical variable takes new values.
All help greatly appreciated.
I wrote a post on this recently that may be helpful. I would use a code node in EM to do this.
Do you really need to do this in EM though? If it's a categorical variable that's usually how SAS treat's it, and you can specify the parameterization type in some procedures.
Ie if your proc allows for a CLASS variable then it will be dummy coded by default, though it may not be the method you want.
I'm assuming EM since you've posted in data mining. If you're using Base SAS this would still be applicable though.
I wrote a post on this recently that may be helpful. I would use a code node in EM to do this.
Do you really need to do this in EM though? If it's a categorical variable that's usually how SAS treat's it, and you can specify the parameterization type in some procedures.
Ie if your proc allows for a CLASS variable then it will be dummy coded by default, though it may not be the method you want.
I'm assuming EM since you've posted in data mining. If you're using Base SAS this would still be applicable though.
Hi Reeza,
Thank you for the response. I'm still trying to make it work. This isn't part of my modeling process...but part of my data-set building/ cleaning process and yes, I do need a dummy variable for each individual occurance of the categorical variable.
I'm having trouble getting this to work though.
Though I knew little to zero about IML, I however still think IML can have a neat solution for this. Following is my quick and dirty way:
OPTIONS VALIDVARNAME=ANY;
DATA HAVE;
INPUT ID VAR $;
CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
PROC TRANSPOSE DATA=HAVE OUT=HAVE2(DROP=_NAME_);
ID VAR;
COPY ID VAR;
RUN;
DATA WANT;
SET HAVE2;
ARRAY V 'ONE_1'N--'THREE_3'N;
DO OVER V;
V=IFN(VAR=VNAME(V),1,0);
END;
RUN;
Updated: To cope non-V7 compliance variable names.
Hi Haikuo,
Thanks for the resposne.
I don't know if this is exactly what I want to do. I need a code that will read the values of the categorical variable and then create a new variable with the name of the categorical variable. Since the categorical variable will be changing from one data set to the next, the values will be changing and thus, this "Hard coding" will fail to create the name and variable since a value that exists in one dataset will most definitely be different than the value in the next.
The main issue of this dummy variable making is the transpose, to make it dynamic is rather trivial, is this update what you need?
OPTIONS VALIDVARNAME=ANY;
DATA HAVE;
INPUT ID VAR $;
CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
PROC SQL NOPRINT ;
SELECT VAR INTO :VAR SEPARATED BY ' ' FROM HAVE;
QUIT;
PROC TRANSPOSE DATA=HAVE OUT=HAVE2(DROP=_NAME_);
ID VAR;
COPY ID VAR;
RUN;
DATA WANT;
SET HAVE2;
ARRAY V &VAR.;
DO OVER V;
V=IFN(VAR=VNAME(V),1,0);
END;
RUN;
Haikuo,
You are a genius! I will now be able to sleep tonight.
Thank you so incredible much! You have no idea how helpful that was!
You are trying to get Design Matrix . Check this blog: http://blogs.sas.com/content/iml/2016/02/24/create-a-design-matrix-in-sas.html DATA HAVE; INPUT ID VAR $; CARDS; 1 ONE_1 2 TWO_2 3 THREE_3 ; run; data Temp ; set have; FakeY = 0; run; proc logistic data=Temp outdesign=EffectDesign(drop=FakeY) outdesignonly; class var / param=glm; model FakeY = var; run; proc print data=EffectDesign; run;
And It is easy for IML.
DATA HAVE;
INPUT ID VAR $;
CARDS;
1 ONE_1
2 TWO_2
3 THREE_3
;
run;
proc iml;
use have;
read all var{var} ;
close;
col=unique(var);
x=design(var);
create want from x[r=var c=col];
append from x[r=var];
close;
run;
proc print;run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.