BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sdhilip
Quartz | Level 8

Hi 

 

I started using SAS Miner extensively. I am creating different models and would like to know how miner is handling the missing values in the category variable(class)?

 

In Python, I need to create dummy variables for all the category variables using an encoder. How does it work in Miner? My dataset has 7 category variables and 10 numerical variables. My all seven category variables contain missing values like '?" and 'unknown'. How SAS creates a dummy variable in the background? Should I need to mention '?' & 'unknown' as a missing value in the replacement node and impute with Tree surrogate or count?

 

What would happen if I don't declare missing values for categorical variables? In that case, how does miner works and analyze? 

 

I have gone through lot of documents but not successful. Could you please advise? 

 

Cheers,

Dhilip

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

In Python, I need to create dummy variables for all the category variables using an encoder. How does it work in Miner?

It depends a little bit on which proc/task is being used. But you can often specify the parameterization methods in the task, there are several that are standard, Referential or GLM, which are noted in the documentation. 

 


Should I need to mention '?' & 'unknown' as a missing value in the replacement node and impute with Tree surrogate or count?

SAS will not know that ? or unknown are missing values, it will consider them character values. You'll need to process or clean the data ahead of time or use the RECODE task to recode them. Missing for a character variable in SAS is space, "". 

If the variable is coded as missing SAS will be able to deal with it as missing - which usually means exclude the value. 

 


What would happen if I don't declare missing values for categorical variables? In that case, how does miner works and analyze? 



It would be considered a different level and included in the analysis. Depending on the reason for a value being 'missing' that may or may not be appropriate. 

 

 

View solution in original post

1 REPLY 1
Reeza
Super User

In Python, I need to create dummy variables for all the category variables using an encoder. How does it work in Miner?

It depends a little bit on which proc/task is being used. But you can often specify the parameterization methods in the task, there are several that are standard, Referential or GLM, which are noted in the documentation. 

 


Should I need to mention '?' & 'unknown' as a missing value in the replacement node and impute with Tree surrogate or count?

SAS will not know that ? or unknown are missing values, it will consider them character values. You'll need to process or clean the data ahead of time or use the RECODE task to recode them. Missing for a character variable in SAS is space, "". 

If the variable is coded as missing SAS will be able to deal with it as missing - which usually means exclude the value. 

 


What would happen if I don't declare missing values for categorical variables? In that case, how does miner works and analyze? 



It would be considered a different level and included in the analysis. Depending on the reason for a value being 'missing' that may or may not be appropriate. 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1125 views
  • 0 likes
  • 2 in conversation