turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- How to select a random percent of data to test?

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-02-2012 09:45 AM

I'm new to sas and I was wondering if it is possible to test a random percent of data. For example, I have a large data set and I want to test between 8% and 20% of the data at random. How would I go about doing something like this? Basically what I have been trying to do is use a do loop with the ranuni command but I can't get my desired specifications. Any help or advice would be much appreciated. Thanks!,

Accepted Solutions

Solution

07-02-2012
11:06 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Torri92

07-02-2012 11:06 AM

It sounds like you need two random number streams. One determines the percentage of observations, then the second actually selects the observations. For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:

data subset;

if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);

retain percent_to_select;

set population;

if ranuni(13579) < percent_to_select;

run;

Here, percent_to_select is continuous. You could apply a function to it, if you want discrete values.

Good luck.

All Replies

Solution

07-02-2012
11:06 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Torri92

07-02-2012 11:06 AM

It sounds like you need two random number streams. One determines the percentage of observations, then the second actually selects the observations. For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:

data subset;

if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);

retain percent_to_select;

set population;

if ranuni(13579) < percent_to_select;

run;

Here, percent_to_select is continuous. You could apply a function to it, if you want discrete values.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Torri92

07-02-2012 01:34 PM

proc surveyselect data=have out=want samprate=0.2 outall;

run;

The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Torri92

07-02-2012 02:00 PM

To combine the two previous examples:

%let min=0.08;

%let max=0.20;

%let rsr=%sysfunc(round(%sysevalf( &min + ( &max - &min ) * %sysfunc(ranuni(0))),.01));

proc surveyselect

data=sashelp.retail

out=want

samprate=&rsr;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FriedEgg

07-02-2012 02:17 PM

One more note ... a second thought about generating the percentage to select with ranuni(12345)

If you run the program multiple times, you will get the same percentage selected each time because the first random number generated by ranuni(12345) will be the same very time. There are two basic approaches to get around this. One is to switch to ranuni(0) instead of ranuni(12345). The drawback here is that ranuni(0) selects a different random number each time. If you are ever asked to prove that your results are correct, you will not be able to replicate them, let alone prove that they are correct. The second approach involves writing a more complex program (a few approaches are possible). So if you add some details (do you care if the results can be replicated), you're sure to get some feedback.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

07-02-2012 02:37 PM

That is a good thought Astounding. The use of proc surveyselect does lend somewhat to solving this issue. The seed, when not defined, will be selected randomly and also printed out. If you want to repeat a sample you can reuse the seed from the previous run.