BookmarkSubscribeRSS Feed
NicolasC
Fluorite | Level 6

Hi there

 

I ws wondering how does SAS EM handles categorical variables. I am used to Python and hot encoding.

For instance if my variable COUNTRY has Germany, France and Spain in it does it create 2 columns (not 3 to avoid the dummy variable trap) of 0 and 1. The reason I ask is that because there is a Dummy Indicator optio in the Transform Variables Node so it does seem like it is not done by default by SAS EM. Many thanks

4 REPLIES 4
Reeza
Super User

@NicolasC wrote:

Hi there

 

I ws wondering how does SAS EM handles categorical variables. I am used to Python and hot encoding.

For instance if my variable COUNTRY has Germany, France and Spain in it does it create 2 columns (not 3 to avoid the dummy variable trap) of 0 and 1. The reason I ask is that because there is a Dummy Indicator optio in the Transform Variables Node so it does seem like it is not done by default by SAS EM. Many thanks


There are multiple ways to specify a categorical variables, and you can include your own, so the Dummy Indicator is a way to include your own dummy variable.

 

SAS will create dummy variables behind the scene but it won't be in your dataset. Note that there are several ways to parameterize dummy variables so make sure it's using the method you expect, ie Referential vs GLM

 

 

NicolasC
Fluorite | Level 6

Hi Reeza

Thanks for your reply. If I unerstand corrctly, SAS EM will automaticaly create dummy indicators to handle categorical varibles.

If so why using the Tansform Variable Node to create dummy indicators? Isn't it redundant? Thanks

Nicolas

Reeza
Super User

@NicolasC wrote:

Isn't it redundant? Thanks

Nicolas


Different procedures likely require different structures. Some may want these separated out. And then you can regroup into different categories if desired. 

There are many ways to do the same things in SAS....many, many, so yes it may be redundant but that's common in programming languages and data analysis tools 🙂

 

rnibbe0
Calcite | Level 5

Reeza, you indicated, "SAS will create dummy variables behind the scene but it won't be in your dataset."

 

Is there a reference to a SAS EM manual that confirms this and the 'behind the scene' method?

 

I have looked through a number of documents and can't find this information. I would like to be able to evaluate when I need to create Dummy Variables and when I can just let the Default SAS EM method do it.

 

We are working on HP SVM nodes currently.

 

Rod.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2206 views
  • 0 likes
  • 3 in conversation