How to select a random percent of data to test?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

How to select a random percent of data to test?

I'm new to sas and I was wondering if it is possible to test a random percent of data. For example, I have a large data set and I want to test between 8% and 20% of the data at random. How would I go about doing something like this? Basically what I have been trying to do is use a do loop with the ranuni command but I can't get my desired specifications. Any help or advice would be much appreciated. Thanks!,


Accepted Solutions
Solution
‎07-02-2012 11:06 AM
Super User
Posts: 5,495

Re: How to select a random percent of data to test?

It sounds like you need two random number streams.  One determines the percentage of observations, then the second actually selects the observations.  For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:

data subset;

  if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);

  retain percent_to_select;

  set population;

   if ranuni(13579) < percent_to_select;

run;

Here, percent_to_select is continuous.  You could apply a function to it, if you want discrete values.

Good luck.

View solution in original post


All Replies
Solution
‎07-02-2012 11:06 AM
Super User
Posts: 5,495

Re: How to select a random percent of data to test?

It sounds like you need two random number streams.  One determines the percentage of observations, then the second actually selects the observations.  For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:

data subset;

  if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);

  retain percent_to_select;

  set population;

   if ranuni(13579) < percent_to_select;

run;

Here, percent_to_select is continuous.  You could apply a function to it, if you want discrete values.

Good luck.

New Contributor
Posts: 3

Re: How to select a random percent of data to test?

proc surveyselect data=have out=want samprate=0.2 outall;

run;

The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected.

Trusted Advisor
Posts: 1,301

Re: How to select a random percent of data to test?

To combine the two previous examples:

%let min=0.08;

%let max=0.20;

%let rsr=%sysfunc(round(%sysevalf( &min + ( &max - &min ) * %sysfunc(ranuni(0))),.01));

proc surveyselect

data=sashelp.retail

out=want

samprate=&rsr;

run;

Super User
Posts: 5,495

Re: How to select a random percent of data to test?

One more note ... a second thought about generating the percentage to select with ranuni(12345)

If you run the program multiple times, you will get the same percentage selected each time because the first random number generated by ranuni(12345) will be the same very time.  There are two basic approaches to get around this.  One is to switch to ranuni(0) instead of ranuni(12345).  The drawback here is that ranuni(0) selects a different random number each time.  If you are ever asked to prove that your results are correct, you will not be able to replicate them, let alone prove that they are correct.  The second approach involves writing a more complex program (a few approaches are possible).  So if you add some details (do you care if the results can be replicated), you're sure to get some feedback.

Good luck.

Trusted Advisor
Posts: 1,301

Re: How to select a random percent of data to test?

Posted in reply to Astounding

That is a good thought Astounding.  The use of proc surveyselect does lend somewhat to solving this issue.  The seed, when not defined, will be selected randomly and also printed out.  If you want to repeat a sample you can reuse the seed from the previous run.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 470 views
  • 1 like
  • 4 in conversation