BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

Hi,

This is from the free online paper: Using New SAS 9.4 Features for Cumulative Logit Models with Partial
Proportional Odds. On page 7, a subset data was created: The dataset “MB” is comprised of 408 of the 508 observations in the dataset.  Dataset “XV” contains 100 observations and will be used for cross-validation purposes for the model.

Question: Was the data MB still contain 508 or not. From my understanding, MB only contain 408, so my question is how was 408 MB separated from 508 because proc selectsurvey will only recreate 100(XV) from MB and MB will still contain 508 atleast from my sas knowledge. How do I have 408 with the proc selectsurvey and not 508?

Thanks

 

3 REPLIES 3
PGStats
Opal | Level 21

I guess proc surveyselect was used with option OUTALL and split according to the newly created variable Selected. Example:

 

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         proc surveyselect data=sashelp.class out=classSamples outall sampsize=15 seed=75868;
 70         run;
 
 NOTE: The data set WORK.CLASSSAMPLES has 19 observations and 6 variables.
 NOTE: PROCEDURE SURVEYSELECT a utilisé (Durée totale du traitement) :
       real time           0.02 seconds
       cpu time            0.03 seconds
       
 
 71         
 72         data mb xv;
 73         set classSamples;
 74         if Selected then output mb;
 75         else output xv;
 76         run;
 
 NOTE: There were 19 observations read from the data set WORK.CLASSSAMPLES.
 NOTE: The data set WORK.MB has 15 observations and 6 variables.
 NOTE: The data set WORK.XV has 4 observations and 6 variables.
 NOTE: DATA statement a utilisé (Durée totale du traitement) :
       real time           0.00 seconds
       cpu time            0.01 seconds
PG
CathyVI
Pyrite | Level 9

@PGStats 

Thanks this codes work but I have a question- I want to understand how it works. This is my code below.

In the datastep code, the cleanp_n is only 1000 obs, the stcp.cleanedp3 =1000 obs. How did stcp.cleanedp4 provides the reminding of the 3000 obs since I did not reference stcp.cleanedp2 that has all the 4000 obs in the datastep? 

 

/* simple random sampling with replacement - proc survey select */
proc surveyselect data=stcp.cleanedp2 method = srs outall sampsize = 1000
seed=535113001 out=cleanp_n ;
run;

 

data stcp.cleanedp3 stcp.cleanedp4;
set cleanp_n;
if selected then output stcp.cleanedp3;
else output stcp.cleanedp4;
run;

PGStats
Opal | Level 21

Please check the Log after you run your code. This should show you that dataset cleanp_n has the same number of obs as stcp.cleanedp2.

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1145 views
  • 0 likes
  • 2 in conversation