BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cloud
Calcite | Level 5

I have been trying to source for some online documentation around the available roles in SAS EM 14.1. However, I have not managed to turn up with anything. In particular, I am looking for an explanation as to when we should use "Key", "Time ID" and "Frequency".

 

Does anyone know where I can find this documentation? 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

I am looking for an explanation as to when we should use "Key", "Time ID" and "Frequency"

 

These three roles are needed only in certain specific situations:

 

Frequency --  This role is exactly what is sounds like and can be used to identify the number of observations that the row represents.  In practice, this would typically be an integer value that corresponds to the number of times that exact observation appears in the data if the data doesn't contain every single instance of the observation.   Please note that it is NOT a 'weight' variable and should not be used as such, so specifying partial frequencies will not generate a meaningful 'weighted analysis'.  In fact, specifying values between 0 and 1 can cause certain numerical issues in some algorithms since it is being interpreted as the number of observations with the same variable values.   In certain analyses such as when using Credit Scoring, some analysts specify partial frequencies (non-integer positive values) but you should always make sure these non-integer values are greater than 1 to avoid problems.  From the Predictive Modeling documentation in SAS Enterprise Miner, you read

Over-weighting can be done in SAS Enterprise Miner by using a frequency variable. However, the current version of SAS Enterprise Miner does not provide full support for sampling weights or other types of weighted analyses, so this method should be approached with care in any analysis where standard errors or significance tests are used, such as stepwise regression. When using a frequency variable for weighting in SAS Enterprise Miner, it is recommended that you also specify appropriate prior probabilities and decision consequences.

 

 

Key -- This role is for High Performance processing and a Key variable is required when accessing certain types of data.  From the HP Data Partition node documentation, it says

The input data set must also contain a variable with the role Key. The key variable contains a unique identifier for each observation in the input data set. Note that the key variable is required for Teradata and Greenplum data sets, but not for Hadoop data sets.

 

Time ID -- This role is to identify a variable containing a timestamp which SAS Enterprise Miner uses for certain types of analyses involving transactional data.   From the Time Series Data Preparation node documentation, you can read

The time series ID must be a numeric variable that uniquely identifies observations in the input and output data sets. The best variable candidates for time series ID variable values are SAS DATE or DATETIME values.

 

Hope this helps!

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

I am looking for an explanation as to when we should use "Key", "Time ID" and "Frequency"

 

These three roles are needed only in certain specific situations:

 

Frequency --  This role is exactly what is sounds like and can be used to identify the number of observations that the row represents.  In practice, this would typically be an integer value that corresponds to the number of times that exact observation appears in the data if the data doesn't contain every single instance of the observation.   Please note that it is NOT a 'weight' variable and should not be used as such, so specifying partial frequencies will not generate a meaningful 'weighted analysis'.  In fact, specifying values between 0 and 1 can cause certain numerical issues in some algorithms since it is being interpreted as the number of observations with the same variable values.   In certain analyses such as when using Credit Scoring, some analysts specify partial frequencies (non-integer positive values) but you should always make sure these non-integer values are greater than 1 to avoid problems.  From the Predictive Modeling documentation in SAS Enterprise Miner, you read

Over-weighting can be done in SAS Enterprise Miner by using a frequency variable. However, the current version of SAS Enterprise Miner does not provide full support for sampling weights or other types of weighted analyses, so this method should be approached with care in any analysis where standard errors or significance tests are used, such as stepwise regression. When using a frequency variable for weighting in SAS Enterprise Miner, it is recommended that you also specify appropriate prior probabilities and decision consequences.

 

 

Key -- This role is for High Performance processing and a Key variable is required when accessing certain types of data.  From the HP Data Partition node documentation, it says

The input data set must also contain a variable with the role Key. The key variable contains a unique identifier for each observation in the input data set. Note that the key variable is required for Teradata and Greenplum data sets, but not for Hadoop data sets.

 

Time ID -- This role is to identify a variable containing a timestamp which SAS Enterprise Miner uses for certain types of analyses involving transactional data.   From the Time Series Data Preparation node documentation, you can read

The time series ID must be a numeric variable that uniquely identifies observations in the input and output data sets. The best variable candidates for time series ID variable values are SAS DATE or DATETIME values.

 

Hope this helps!

Doug

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 4328 views
  • 0 likes
  • 2 in conversation