This is from the free online paper: Using New SAS 9.4 Features for Cumulative Logit Models with Partial
Proportional Odds. On page 7, a subset data was created: The dataset “MB” is comprised of 408 of the 508 observations in the dataset. Dataset “XV” contains 100 observations and will be used for cross-validation purposes for the model.
Question: Was the data MB still contain 508 or not. From my understanding, MB only contain 408, so my question is how was 408 MB separated from 508 because proc selectsurvey will only recreate 100(XV) from MB and MB will still contain 508 atleast from my sas knowledge. How do I have 408 with the proc selectsurvey and not 508?
I guess proc surveyselect was used with option OUTALL and split according to the newly created variable Selected. Example:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 proc surveyselect data=sashelp.class out=classSamples outall sampsize=15 seed=75868; 70 run; NOTE: The data set WORK.CLASSSAMPLES has 19 observations and 6 variables. NOTE: PROCEDURE SURVEYSELECT a utilisé (Durée totale du traitement) : real time 0.02 seconds cpu time 0.03 seconds 71 72 data mb xv; 73 set classSamples; 74 if Selected then output mb; 75 else output xv; 76 run; NOTE: There were 19 observations read from the data set WORK.CLASSSAMPLES. NOTE: The data set WORK.MB has 15 observations and 6 variables. NOTE: The data set WORK.XV has 4 observations and 6 variables. NOTE: DATA statement a utilisé (Durée totale du traitement) : real time 0.00 seconds cpu time 0.01 seconds
Thanks this codes work but I have a question- I want to understand how it works. This is my code below.
In the datastep code, the cleanp_n is only 1000 obs, the stcp.cleanedp3 =1000 obs. How did stcp.cleanedp4 provides the reminding of the 3000 obs since I did not reference stcp.cleanedp2 that has all the 4000 obs in the datastep?
/* simple random sampling with replacement - proc survey select */
proc surveyselect data=stcp.cleanedp2 method = srs outall sampsize = 1000
seed=535113001 out=cleanp_n ;
data stcp.cleanedp3 stcp.cleanedp4;
if selected then output stcp.cleanedp3;
else output stcp.cleanedp4;
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.