About phuzface

PaigeMiller · ‎06-04-2018

I don't understand why this is happening, but the lowest prediction error rate I'm getting is by including ALL of the variables. I think you need to be open to what the data is telling you. This is what the data is telling you. HOWEVER There is such a thing as overfitting. When a model is overfit, you have added at least one (possibly more than one) variable that is essentially being fit to the random noise of the data, rather than being fit to the signal in the data. So, if you have overfitting, you ought to remove terms from the model, which will give you WORSE fit statistics, but more "stable" (or to phrase things differently, a model that is less variable). So avoiding overfitting gives you WORSE fit but a better model on other measures. How do you avoid overfitting in PROC DISCRIM? You can use the CROSSVALIDATE option which will show you the classifications using cross-validation; if those are poor, then you can remove terms from the model until the cross-validation statistics are closer to perfect classification (realizing that perfect classification isn't really possible). There is an example in the PROC DISCRIM documentation where the cross-validation error rates are much higher than the error rates of the model, and this indicates the model has been overfit. I have never been a fan of stepwise methods, and I avoid them like the plague. Google "problems with stepwise". What would I use? I would use PLS Discriminant Analysis (PLS-DA) which is PROC PLS with dummy variables for Y to indicate which region the observation is.

PaigeMiller · ‎05-22-2018

rest of the output are screenshots in the attached word document. Some of us won't open Microsoft Office documents as they can be a security risk. Just post your screen captures directly into your message. I believe that there are a couple of steps for this where one has to look first at some kind of correlation or interaction between the factors or something like that, but I could be wrong. I'll admit that I had a hard time grasping two-way ANOVA so that's probably why I'm struggling with this. No, there are no such steps. MANOVA will tell you about interactions and correlations, if they exist. It sounds as if your problem is with understanding ANOVA, except that the rest of your message talks about interpreting MANOVA. Please ask specific questions.

Kurt_Bremser · ‎05-19-2018

There's also a very fine book out there by @Rick_SAS: Simulating data with SAS.

PGStats · ‎05-13-2018

Does your project require that you use DA? You can get a decent decision tree model involving only 2 variables (I let you find out which) with data test; length Name $16 HoF $3; infile "&sasforum.\datasets\baseballHOF without known or severly suspected PED users AND without Pete Rose.csv" truncover dsd firstobs=2; input Name HoF Yrs WAR WAR7 JAWS Jpos G AB R H HR RBI SB BB BA OBP SLG OPS OPSadj; run; proc hpsplit data=test maxdepth=5 maxbranch=4; class HoF; model HoF (event="Yes") = Yrs -- OPSadj; id name; grow entropy; prune costcomplexity; run;

ballardw · ‎03-09-2018

It may help to provide a small example data set with 3 or 4 fake companies and a number of values for the other variables similar to your data so we can make some charts and see which comes close to what you want. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712 will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against. Your example code would have attempted to create over 1000 plots (one per name) and is the likely problem of not having enough "panels". One approach may require restructuring your data. See if this comes anywhere close to what you might want. data junk; input name $ position $ value ; datalines; a SAL_FULL 50000 b SAL_FULL 52000 c SAL_FULL 63000 d SAL_FULL 57000 e SAL_FULL 77700 f SAL_FULL 80000 a SAL_ASC 44000 b SAL_ASC 48000 c SAL_ASC 49000 d SAL_ASC 55000 e SAL_ASC 67700 f SAL_ASC 60000 ; run; proc sgplot data=junk; hbox value/ category=position; run; Note that Proc BOXPLOT is pretty old and new enhancements are mostly going to proc SGPLOT/SGPANEL. ODS GRAPHICS statement would be using to adjust the size of the graphic area and 13 positions, as above, could be quite easy to fit in a single display.

PaigeMiller · ‎03-09-2018

I agree with all that yoou wrote except the part where you said "I need to run a Tukey test..." You don't "need to" run the Tukey test, it's an option, among many options, to identify the parts of the interaction that are statistically different.

PGStats · ‎02-10-2018

Use an informat for reading, and a format for reporting: proc format; inValue myMonth Aug=1 Sep=2 Oct=3 Nov=4 Dec=5 Jan=6 Feb=7 Mar=8 Apr=9 May=10 Jun=11 Jul=12; value myMonthStr 1="August" 2="September" 3="October" 4="November" 5="December" 6="January" 7="February" 8="March" 9="April" 10="May" 11="June" 12="July"; run; data MentalHealth; Input Month :myMonth. Moon $ Admission; Datalines; Aug Before 6.4 Sep Before 7.1 Oct Before 6.5 Nov Before 8.6 Dec Before 8.1 Jan Before 10.4 Feb Before 11.5 Mar Before 13.8 Apr Before 15.4 May Before 15.7 Jun Before 11.7 Jul Before 15.8 Aug During 5 Sep During 13 Oct During 14 Nov During 12 Dec During 6 Jan During 9 Feb During 13 Mar During 16 Apr During 25 May During 14 Jun During 14 Jul During 20 Aug After 5.8 Sep After 9.2 Oct After 7.9 Nov After 7.7 Dec After 11 Jan After 12.9 Feb After 13.5 Mar After 13.1 Apr After 15.8 May After 13.3 Jun After 12.8 Jul After 14.5 ; proc means data=MentalHealth order=unformatted; Title "Descriptive Statistics by Month"; Var Admission; class Month; format month myMonthStr.; run;

Online Status	Offline
Date Last Visited	‎06-04-2018 06:21 PM

Discriminant analysis: removing highly correleated variables causes p...

Two-way MANOVA output interpretation

Data set needed, please!

Linear Discriminant Analysis; predicting MLB HOF

Side-by-side box plots

Re: Two-way ANOVA

Two-way ANOVA

Sorting by month but starting in the middle of the year

Re: Discriminant analysis: removing highly correleated variables caus...

Re: Two-way MANOVA output interpretation

Re: Data set needed, please!

Re: Linear Discriminant Analysis; predicting MLB HOF

Re: Side-by-side box plots

Re: Two-way ANOVA

Re: Sorting by month but starting in the middle of the year