BookmarkSubscribeRSS Feed
BearerofSAS
Calcite | Level 5

As far as I understand, we transform variables to make them more symmetrical/linear relationship with the target variable and this can only be on interval data. But why might SAS choose not to transform a variable as opposed to transforming other explanatory variables (bearing in mind if they are all interval level data types)? Could it be because they might be already approximately symmetrical or the variable potentially causing overfitting? Hence they aren't transformed?  

 

Cheers

5 REPLIES 5
PaigeMiller
Diamond | Level 26

@BearerofSAS wrote:

As far as I understand, we transform variables to make them more symmetrical/linear relationship with the target variable and this can only be on interval data. But why might SAS choose not to transform a variable as opposed to transforming other explanatory variables (bearing in mind if they are all interval level data types)? Could it be because they might be already approximately symmetrical or the variable potentially causing overfitting? Hence they aren't transformed?  

 

Cheers


You haven't provided any context, there are probably dozens of situations where a transformation of variables might make sense. However, you haven't really stated what the situation is that might require a transformation. Furthermore, when you say "why might SAS choose...", the important point to remember is that SAS doesn't do the choosing, you as the programmer do the choosing — or more precisely, you might choose an algorithm that does the transformation.

 

--
Paige Miller
mkeintz
PROC Star

@PaigeMiller I agree.

 

But this is data "mining".   Of course, the tool does the choosing - justification of a particular transform is a thing of the past.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
art297
Opal | Level 21

If you applied a transform variables node in enterprise miner, the default setting is to apply no transformation. You may want to set it to 'best' or whatever specific transformation you want.

 

Art, CEO, AnalystFinder.com

BearerofSAS
Calcite | Level 5

I want to understand user profiles of customers who have a credit card and would like to see if different customers have similar profiles.

 

I connected the Transform variable node to my dataset, and run it. Then I go to view the results and get the following 13 interval variables out of 14 that were transformed. However, Age variable was not transformed by the transform node. I think this might be because it was already approximately symmetrical originally or it may make the target variable not generalisable if age got transformed. But I am not sure. This is just what I am thinking of why age was not transformed. 

 

Any further clarification is greatly appreciated. 

 

And I'm pretty sure the software has chosen the transformations for me, but I could be wrong. 

 

Hopefully, this makes sense. 

PaigeMiller
Diamond | Level 26

@BearerofSAS wrote:

I want to understand user profiles of customers who have a credit card and would like to see if different customers have similar profiles.

 

I connected the Transform variable node to my dataset, and run it. Then I go to view the results and get the following 13 interval variables out of 14 that were transformed. However, Age variable was not transformed by the transform node. I think this might be because it was already approximately symmetrical originally or it may make the target variable not generalisable if age got transformed. But I am not sure. This is just what I am thinking of why age was not transformed. 

 


In my understanding, symmetry or lack of symmetry is not usually a reason to transform anything. However, @art297 points out that you may not have selected a transformation in the transform variables node.

 

 

And I'm pretty sure the software has chosen the transformations for me, but I could be wrong. 

 

You tell SAS what transform or algorithm to use, it then applies the transform selected or the algorithm. For some data, the algorithm which you told SAS to use does not transform the data. It is the user who instructs SAS what to do. Again, context is everything. We don't have your data, and we don't know what settings or algorithms you have chosen.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1113 views
  • 0 likes
  • 4 in conversation