02-25-2023
gcjfernandez
SAS Employee
Member since
09-18-2013
- 151 Posts
- 0 Likes Given
- 40 Solutions
- 45 Likes Received
About
George Fernandez, former professor of applied statistics, and the Director for the University of Nevada- Reno Center for Research Design and Analysis currently serves as Senior Analytical Consultant, SAS Education. He has more than 23 years of experience in teaching courses such as design and analysis of experiments, linear and non-linear regression, multivariate statistical methods and SAS programming. He has over 25 years experience in many statistical and graphical SAS modules. He has won best paper and poster presentation awards at the regional and international conferences. He has presented several invited full-day workshops on "Applications of user-friendly statistical methods in Data mining: American Statistical Association Joint meeting in Atlanta (2001), Western SAS users Conference in Arizona (2000), in San Diego (2002),and San Jose (2005), 56th Deming's conference, Atlantic City (2003), Key-note Speaker and workshop presenter, 16th Conference on Applied Statistics, Kansas State University. He has also organized 7th Western Users of SAS conference (WUSS) at Los Angeles in 1999 and served as the section chair, SUGI31 and SGF2007-2009. His book on "Data mining using SAS applications" (CRC press / Chapman Hall) contains many user-friendly SAS macro-applications.
Specialties: Training Consultant in SAS Forecast Server/Studio, SAS Enterprise Miner, SAS survey probability design course, SAS/STATS and SAS/ETS, programming, Data Mining, SAS Visual text analytics,and Visual forecasting.
-
Latest posts by gcjfernandez
Subject Views Posted 1126 08-04-2022 12:11 PM 492 07-22-2022 03:30 PM 462 07-22-2022 03:17 PM 5413 04-25-2022 01:47 PM 984 03-03-2022 01:56 AM 1019 03-02-2022 03:04 PM 488 03-02-2022 01:59 AM 1264 10-18-2021 06:29 AM 1282 10-18-2021 02:44 AM 1299 10-18-2021 01:35 AM -
Activity Feed for gcjfernandez
- Got a Like for Re: Intuitive Way to Interpret Intercept Value of Shapley Values Output. 09-03-2023 06:36 PM
- Posted Getting Started with SAS® Visual Text Analytics Q&A, Slides, and On-Demand Recording on Ask the Expert. 08-04-2022 12:11 PM
- Got a Like for Webinar on July 28th, 11 AM – Noon ET entitled Getting Started with SAS® Visual Text Analytics. 07-22-2022 03:32 PM
- Posted Webinar on July 28th, 11 AM – Noon ET entitled Getting Started with SAS® Visual Text Analytics on SAS Visual Analytics. 07-22-2022 03:30 PM
- Posted Webinar on July 28th, 11 AM – Noon ET entitled Getting Started with SAS® Visual Text Analytics on SAS Data Science. 07-22-2022 03:17 PM
- Posted Re: Intuitive Way to Interpret Intercept Value of Shapley Values Output on SAS Data Science. 04-25-2022 01:47 PM
- Posted Re: New mean of variable after adjusting for covariate for total population on Statistical Procedures. 03-03-2022 01:56 AM
- Posted Re: New mean of variable after adjusting for covariate for total population on Statistical Procedures. 03-02-2022 03:04 PM
- Posted Re: Survey Select cascading stratified random sampling on Statistical Procedures. 03-02-2022 01:59 AM
- Got a Like for Re: NLP LITI Rules. 01-31-2022 05:20 PM
- Posted Re: SAS EM: decision tree on SAS Data Science. 10-18-2021 06:29 AM
- Posted Re: SAS EM: decision tree on SAS Data Science. 10-18-2021 02:44 AM
- Posted Re: SAS EM: decision tree on SAS Data Science. 10-18-2021 01:35 AM
- Posted Re: Modify a variable in SAS Miner on Statistical Procedures. 10-18-2021 01:17 AM
- Posted Improved ways to classify over-weight and obesity: Welcome Body Fat Index (BFI) on SAS Communities Library. 10-17-2021 02:04 AM
- Posted Re: Decision tree splitting rule in SAS EM on SAS Data Science. 10-14-2021 02:18 AM
- Posted Re: Decision tree splitting rule in SAS EM on SAS Data Science. 10-13-2021 06:36 PM
- Posted Re: In SAS EM, how can I know which one is the base level for nominal variable? on SAS Data Science. 10-07-2021 01:47 AM
- Posted Re: In SAS EM, how can I know which one is the base level for nominal variable? on SAS Data Science. 10-06-2021 05:15 PM
- Posted Re: In SAS EM, how can I know which one is the base level for nominal variable? on SAS Data Science. 10-06-2021 12:16 AM
-
My Liked Posts
Subject Likes Posted 1 04-25-2022 01:47 PM 1 07-22-2022 03:17 PM 1 03-06-2021 01:51 PM 1 08-21-2021 02:56 AM 1 08-14-2021 02:53 AM -
My Library Contributions
Subject Likes Author Latest Post 0 0 1 0 0
05-13-2020
01:50 AM
Re: Applied Analytics Using SAS Enterprise Miner
Would it be possible to clarify how ASE (Average Square Error) is calculated (its definition is given at page 3.72 of the course notes)?
Asking this because, by looking at the output from any modelling node, it looks like the denominator is based on the total number of cases in the whole sample (Training+Validation), not just Training or Validation datasets (see image at page 3.89 for output from for Decision Tree; same applies to Regression node - see example at page 4-43).
My Answer:
When the target variable is interval the denominator for ASE is N (Training or Validation sample size) Please see Course PDF 3-72
When the Target variable is Binary the denominator for ASE is N x 2 (2 levels: Event and non event) Please see Course PDF 3-72
In demo data because we are making 50:50 split for Training and validation it appears that the denominator is (Train +validation)
But actually for training ASE = SSE/2N.
Moreover, in the output from a Regression node, the Mean Square Error (MSE) should be calculated as Sum of Squared Errors (SSE) divided by the Degrees of Freedom or Error (DFE); however, that does not seem to be the case; here is a screenshot based on the model fitted at page 4-42 of the course notes:
My Answer:
In computing MSE for training and validation data DFE is not used in SASEM. It is using N as the denominator. Because in Decision Tree and Neural Net there are no Model degrees of freedom. Therefore no Error DF. Similarly in Validation data no model is fitted. Therefore in order have a comparable Error estimate across DT, Reg, and NN, it is using N as the denominator in MSE.
... View more
05-12-2020
04:09 PM
With regard to chapter 7 of the course notes:
1. What is the difference between "SAS Code" and "Optimized SAS Code" (see page 7-8)?
My Answer:
SAS code: Base SAS code associated with the process flow diagram used in building the predictive model.
Optimized SAS code: After excluding all redundant SAS codes associated with the variables not included in the final model, the optimized SAS code only includes BASE SAS code needed to score the new scoring data.
My Answer:
2. Is it correct to say that when scoring a Score table, Enterprise Miner assumes the table is a true representation of the population in terms of proportions of events/non-events?
My Answer:
Yes this is a correct assumption
As such, no adjustments based on prior probabilities are applied; however, Decisions, in terms of Expected Profit and Decision classification, are based on the Decision Weights and Prior Probabilities specified for the data source used for training
My Answer:
This is incorrect. Because if over or separate sampling was practiced during the model development, the SAS code for future scoring contains code for proper posterior probability adjustments, and computing expected and average profits. 3. With regard to Decisions, is it possible to use property Decision of the Input Data node related to the Score table (see page 7.7) to amend the decision weights to be applied during scoring? If so, would that require a target variable present on the Score dataset?
My Answer:
Yes, this step is automatically performed and this adjustment is included in the optimized Base SAS code.
... View more
05-12-2020
11:47 AM
I have a couple of questions on Cluster Analysis (chapter 8 of course notes):
1. In what scenarios should categorical variables, via dummy indicators, be used for Clustering? Or would it just be better to use interval variables as suggested by the course notes at page 8-9? ("An interval measurement level is recommended for k-means to produce non-trivial clusters")
My Answers:
For K-means and Hierarchical clustering interval variables are recommended. SAS HP cluster node also can perform ABC clustering based on Manhattan distance. For this option you can also include dummy variables from a categorical var. 2. In what instances would a Range Standardisation (with reference to property "Internal Standardization") be recommend in place of the usual standardisation (i.e. subtracting the mean and dividing by the standard deviation)?
My answer:
For K-mean clustering and PCA , Z-standardization is preferred. For some special NN machine learning algorithm Range-normalization may be preferred.
... View more
05-11-2020
02:29 AM
To impute missing values in Survey analysis, use PROC SURVEYIMPUTE and create imputed JK weights.
Then use these imputed JK replicate weights with PROC Surveylogistic to fit the generalized survey logistic model.
Please refer this paper: https://support.sas.com/resources/papers/proceedings16/SAS3520-2016.pdf
... View more
05-09-2020
04:08 PM
1. Gain: this metric is reported in the Output window (under "Statistics Table") of the "Model Comparison" node (see page 6-7 of course notes); what is its formula/definition? Is this based on the definition given at page 256 of "Enterprise Miner 15.1: Reference Help": ((% of events in decile / random % of events in decile)-1). If so, what is its interpretation?
MY ANSWER:
Both LIFT and GAIN statistics are computed at the depth of 10th decile (by default) and Gain=Lift-1. The formula given above is correct for the Gain.
2. Gain Chart: this is the chart displayed as part of the "Score Ranking Overlay" output when selecting option "Gain"; below is a screenshot taken from the example/demonstration in "Lesson 7: Model Assessment Using SAS Enterprise Miner" (see also page 6-19 of the course notes); again, how are the values on the Y-axis calculated?
My answer:
The values Y-axis is Lift-1
3. Cumulative Gain: page 6-17 of the course notes states that "cumulative percent response" chart is more widely known as "cumulative gain" in the predictive modeling literature. It also adds that "[...] Plotting cumulative gain for all selection fractions yields a gains chart"; at page 6-20, it says "It is instructive to view the actual proportion of cases with the primary outcome (called gain or cumulative percent response) at each decile": (a) from other sources on internet (see this as an example), it seems that "cumulative gain" is related to the "percentage of the total possible positive responses (i.e. primary outcome events) at a given depth" (in the "Score Ranking Overlay" window, that is given by "Cumulative % Capture Response"); is this just an example of inconsistency in the use of the same term? (b) how does the "cumulative gain" differ from the Gain Chart in point (2) above?
My answer:
Cumulative gain is equal to Cumulative % Response, Therefore SAS EM is only showing Cumulative % Response.
Please note that Cumulative % Capture Response = (Cumulative % of events in a decile / total number of events) is different from Cumulative % Response = (Cumulative % of events in a decile).
Please let me know if you have any further questions.
... View more
05-09-2020
03:27 PM
Meaning of "Computed": This is the default setting for the replacement value. By default the replacement node identify outliers (mean +- 3 SD) for interval inputs and compute this truncated values (Top ceiling) and replaces globally for all inputs. If you want to apply this option only for the selected input variable then you need to click the replacement editor and apply the required setting there.
Instead of using this computed value (Mean +- 3sd) you could also use a user-specified constant value in the global setting or in the replacement editor individually for the selected inputs. Similarly you can also choose 'Missing'.
In the course demo, we want to replace all zeros only in DemMedIncome variable. Therefore using the replacement editor we selected DemMedianincome and specified 1 as the lower replacement value to identify all the records with 0 income and replaced with missing.
I hope this explanation is clear. If you need further clarification please let me know. Also adding a screenshot of the replacement editor window.
... View more
05-09-2020
02:58 PM
Thank you for asking this question and for further clarification. When computing for average profit, SAS EM always adjust the frequencies of primary and secondary outcome events for prior probabilities when the prior probability adjustment is enabled in the decision setting (Please see page 6:24 course PDF). To illustrate this point I have modified the corresponding slide below:
Hope this explanation is adequate. If not please let me know.
... View more
05-06-2020
03:26 PM
The surrogate model example is providing a solution to assess the variable importance of the neural net (black box model). If you want to make decision or prediction use the posterior probability values derived from the NN model directly. Because the posterior Probability reported in the surrogate decision tree model is not adjusted for over-sample or priors.
Therefore, in the course notes in case you need to use the posterior probabilities from the surrogate model, they provide the following solutions:
1) Use the scored data where event distribution reflects what is available in the reference population.
2) You could also use SAS code editor in EM and adjust for priors and decision weights (Not in the course notes)
3) In the transform node there is a rudimentary SAS code option where you create a weight variable (based on prior probability values) an assign a role of frequency. That way you can adjust the posterior probability for priors.
Please note this weight option is different from survey design weights and SAS EM is not meant for using survey data. It is recommend for building predictive models.
... View more
05-06-2020
03:06 PM
If you choose the validation error but no validation data then the model selection is based on the selection method (Forward, backward or step wise settings) . If the selection method specified is none then full model will be selected.
... View more
05-06-2020
03:00 PM
Many thanks for providing this page number and additional details. Currently no default setting is available to modify the default number of bins (10). Therefore at this point we have to manually adjust this number.
... View more
05-06-2020
02:44 PM
I agree with your comments because adjusting for prior probability basically only shifting the intercept values. Therefore this should not affect the model selection. However, if you want the prior values affect your model decision you should consider the decision option and provide decision weights (Please refer Chapter 6 in the AAEM course notes)
... View more
05-06-2020
02:33 PM
I hate to provide rule of thump when optimal solutions for most options are data specific.
However, for this case SAS Enterprise miner advanced metadata advisor is using 20 as the categorical levels threshold to reject the nominal variable. You could consider this default setting as the best rule of thump.
... View more
05-06-2020
02:26 PM
Please include the Page numbers of the course pdf or sections where the default display option of the explore window is mentioned.
Thanks
George
... View more
05-06-2020
02:22 PM
The Enterprise Miner diagrams for the 4 case studies described in the appendix are included with the course data files where the students can download from the extended learning page.
... View more
09-06-2017
02:22 PM
My email : George.fernandez@sas.com
... View more
- « Previous
- Next »