Hello everyone, I'm relatively new to classification trees and I was hoping to ask some questions about using PROC HPSPLIT (STAT 13.2) to run exhaustive CHAID. To give some background, I'm working with a large dataset to model the risk of the dichotomous outcome "ipvcc" based on 3-6 nominal demographic variables, "p4, region, lgbtq," etc. The most recent iteration of my code is below: ods graphics on; proc hpsplit data=sv.mssgrade3 event="y" alpha=0.05 bonferroni maxdepth=30 mincatsize=200 leafsize=50 nodes; criterion CHAID ; input p4 / level=nom; input region / level=nom; input lgbtq / level=nom; input ses_proxy / level=nom; input race / level=nom; prune none; target ipvcc; score out=work.mss_score; run; The thing is, I'm running this analysis alongside a colleague using SPSS, and we've been getting different outputs whenever we run the model with five or more independent variables. All of our test data is the same, and we've ruled out differences in how missing variables are treated as the cause of the discrepancy, so we're thinking it has something to do with CHAID options. Her SPSS syntax looks like this: TREE IPSEXV [n] BY p4 [n] region [n] lgbtq [n] ses_proxy [n] race [n] /TREE DISPLAY=TOPDOWN NODES=BOTH BRANCHSTATISTICS=YES NODEDEFS=YES SCALE=AUTO /DEPCATEGORIES USEVALUES=[VALID] /PRINT MODELSUMMARY CLASSIFICATION RISK TREETABLE /METHOD TYPE=CHAID /GROWTHLIMIT MAXDEPTH=30 MINPARENTSIZE=200 MINCHILDSIZE=50 /VALIDATION TYPE=NONE OUTPUT=BOTHSAMPLES /CHAID ALPHASPLIT=0.05 ALPHAMERGE=0.05 SPLITMERGED=NO CHISQUARE=PEARSON CONVERGE=0.001 MAXITERATIONS=100 ADJUST=BONFERRONI /COSTS EQUAL /MISSING NOMINALMISSING=MISSING. She seems to be in control over more aspects of her procedure than I am. Mainly, in her syntax, it looks like she's specified rules for merging and convergence that aren't explicitly specified in my syntax. My first question is, how can I adjust my syntax so that my settings are similar to my colleague's? My second question is: I don't think either of us right now are doing exhaustive CHAID right now as opposed to regular CHAID. How can I specify exhaustive CHAID in PROC HPSPLIT? Any advice helps! Thank you for your time.
... View more