BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Daniel121181
Fluorite | Level 6

Hello,

Apologies if this is a basic question, but I am struggling to fully understand the Advanced Adviser Options when Creating a Data Source Using the Data Source Wizard in SAS Enterprise Miner. I have looked at the manual here: 

https://documentation.sas.com/doc/en/emref/14.3/p0ldp4l9cnob3gn1dytmimkunaa6.htm#p1ei4x1wp5rxknn0zgx...

but would very much appreciate some clarification. 

Detect Class Levels — specifies whether the number of class levels is determined for each variable.
Does this mean that because the software has not yet decided whether a given variable is numerical or categorical etc. it entertains the possibility of it being the latter? For example, even if it were provided with numerical values, the software would count the frequency of unique numbers; if the number 5 appeared 10 times in the variable, then the count for "category" 5 would be 10?
 
Class Levels Count Threshold — specifies the maximum number of class levels for each variable. When the Detect Class Levels property is set to Yes, if there are more class levels than the value specified here, the variable is considered an interval variable. Valid values are positive integers greater than or equal to 2.
I'm assuming this the part where the software is deciding whether the variable is categorical or numeric. So, if Detect class levels is "yes" and the class levels count threshold is 20, then a "numeric" variable with no more than 20 distinct "classes" would be considered a factor, whilst for >20 distinct classes it would be considered numeric?
 

Also I cannot really see the distinction between the above settings and others which are available, namely:

Reject Vars with Excessive Class Values 

and

Reject Levels Count Threshold 

 

Thank you for your help!

Daniel

 

1 ACCEPTED SOLUTION

Accepted Solutions
gcjfernandez
SAS Employee

Q: Detect Class Levels — specifies whether the number of class levels is determined for each variable.
Does this mean that because the software has not yet decided whether a given variable is numerical or categorical etc. it entertains the possibility of it being the latter? For example, even if it were provided with numerical values, the software would count the frequency of unique numbers; if the number 5 appeared 10 times in the variable, then the count for "category" 5 would be 10?
Answer:
In Basic mode, whether a variable is treated as Nominal or interval by variable type and format only.
In advanced mode, automatic initial roles and level values are determined based on the variable type, the variable format, and the number of distinct values contained in the variable.
Therefore, an initially declared interval variable can be re-classified as nominal if the number of unique data values are less than 20 (default number, which can be modified).

Q:Class Levels Count Threshold — specifies the maximum number of class levels for each variable. When the Detect Class Levels property is set to Yes, if there are more class levels than the value specified here, the variable is considered an interval variable. Valid values are positive integers greater than or equal to 2.
I'm assuming this the part where the software is deciding whether the variable is categorical or numeric. So, if Detect class levels is "yes" and the class levels count threshold is 20, then a "numeric" variable with no more than 20 distinct "classes" would be considered a factor, whilst for >20 distinct classes it would be considered numeric?
Answer:
Yes your observation is correct.

Q: Also I cannot really see the distinction between the above settings and others which are available, namely:

Q: Reject Vars with Excessive Class Values
A: This is rejecting a nominal variable if the number of class levels exceeds 20 (default)
and

Q: Reject Levels Count Threshold ;
A: This is related to an interval variable property whether to treat this interval variable as interval or nominal based on number of unique numeric values.

View solution in original post

2 REPLIES 2
gcjfernandez
SAS Employee

Q: Detect Class Levels — specifies whether the number of class levels is determined for each variable.
Does this mean that because the software has not yet decided whether a given variable is numerical or categorical etc. it entertains the possibility of it being the latter? For example, even if it were provided with numerical values, the software would count the frequency of unique numbers; if the number 5 appeared 10 times in the variable, then the count for "category" 5 would be 10?
Answer:
In Basic mode, whether a variable is treated as Nominal or interval by variable type and format only.
In advanced mode, automatic initial roles and level values are determined based on the variable type, the variable format, and the number of distinct values contained in the variable.
Therefore, an initially declared interval variable can be re-classified as nominal if the number of unique data values are less than 20 (default number, which can be modified).

Q:Class Levels Count Threshold — specifies the maximum number of class levels for each variable. When the Detect Class Levels property is set to Yes, if there are more class levels than the value specified here, the variable is considered an interval variable. Valid values are positive integers greater than or equal to 2.
I'm assuming this the part where the software is deciding whether the variable is categorical or numeric. So, if Detect class levels is "yes" and the class levels count threshold is 20, then a "numeric" variable with no more than 20 distinct "classes" would be considered a factor, whilst for >20 distinct classes it would be considered numeric?
Answer:
Yes your observation is correct.

Q: Also I cannot really see the distinction between the above settings and others which are available, namely:

Q: Reject Vars with Excessive Class Values
A: This is rejecting a nominal variable if the number of class levels exceeds 20 (default)
and

Q: Reject Levels Count Threshold ;
A: This is related to an interval variable property whether to treat this interval variable as interval or nominal based on number of unique numeric values.

Daniel121181
Fluorite | Level 6

Thank you very much, sir. I very much appreciate your help. 

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1009 views
  • 0 likes
  • 2 in conversation