Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proportional Odds Model

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-28-2022 03:54 PM
(225 views)

Hi,

This is from the free online paper: Using New SAS 9.4 Features for Cumulative Logit Models with Partial

Proportional Odds. On page 7, a subset data was created: The dataset “MB” is comprised of 408 of the 508 observations in the dataset. Dataset “XV” contains 100 observations and will be used for cross-validation purposes for the model.

Question: Was the data MB still contain 508 or not. From my understanding, MB only contain 408, so my question is how was 408 MB separated from 508 because proc selectsurvey will only recreate 100(XV) from MB and MB will still contain 508 atleast from my sas knowledge. How do I have 408 with the proc selectsurvey and not 508?

Thanks

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I guess proc surveyselect was used with option OUTALL and split according to the newly created variable *Selected*. Example:

```
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 proc surveyselect data=sashelp.class out=classSamples outall sampsize=15 seed=75868;
70 run;
NOTE: The data set WORK.CLASSSAMPLES has 19 observations and 6 variables.
NOTE: PROCEDURE SURVEYSELECT a utilisé (Durée totale du traitement) :
real time 0.02 seconds
cpu time 0.03 seconds
71
72 data mb xv;
73 set classSamples;
74 if Selected then output mb;
75 else output xv;
76 run;
NOTE: There were 19 observations read from the data set WORK.CLASSSAMPLES.
NOTE: The data set WORK.MB has 15 observations and 6 variables.
NOTE: The data set WORK.XV has 4 observations and 6 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
real time 0.00 seconds
cpu time 0.01 seconds
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks this codes work but I have a question- I want to understand how it works. This is my code below.

In the datastep code, the cleanp_n is only 1000 obs, the stcp.cleanedp3 =1000 obs. How did stcp.cleanedp4 provides the reminding of the 3000 obs since I did not reference stcp.cleanedp2 that has all the 4000 obs in the datastep?

/* simple random sampling with replacement - proc survey select */

proc surveyselect data=stcp.cleanedp2 method = srs outall sampsize = 1000

seed=535113001 out=cleanp_n ;

run;

data stcp.cleanedp3 stcp.cleanedp4;

set cleanp_n;

if selected then output stcp.cleanedp3;

else output stcp.cleanedp4;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Please check the Log after you run your code. This should show you that dataset cleanp_n has the same number of obs as stcp.cleanedp2.

PG

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.