BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Torri92
Calcite | Level 5

I'm new to sas and I was wondering if it is possible to test a random percent of data. For example, I have a large data set and I want to test between 8% and 20% of the data at random. How would I go about doing something like this? Basically what I have been trying to do is use a do loop with the ranuni command but I can't get my desired specifications. Any help or advice would be much appreciated. Thanks!,

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It sounds like you need two random number streams.  One determines the percentage of observations, then the second actually selects the observations.  For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:

data subset;

  if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);

  retain percent_to_select;

  set population;

   if ranuni(13579) < percent_to_select;

run;

Here, percent_to_select is continuous.  You could apply a function to it, if you want discrete values.

Good luck.

View solution in original post

5 REPLIES 5
Astounding
PROC Star

It sounds like you need two random number streams.  One determines the percentage of observations, then the second actually selects the observations.  For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:

data subset;

  if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);

  retain percent_to_select;

  set population;

   if ranuni(13579) < percent_to_select;

run;

Here, percent_to_select is continuous.  You could apply a function to it, if you want discrete values.

Good luck.

verdantsphinx
Fluorite | Level 6

proc surveyselect data=have out=want samprate=0.2 outall;

run;

The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected.

FriedEgg
SAS Employee

To combine the two previous examples:

%let min=0.08;

%let max=0.20;

%let rsr=%sysfunc(round(%sysevalf( &min + ( &max - &min ) * %sysfunc(ranuni(0))),.01));

proc surveyselect

data=sashelp.retail

out=want

samprate=&rsr;

run;

Astounding
PROC Star

One more note ... a second thought about generating the percentage to select with ranuni(12345)

If you run the program multiple times, you will get the same percentage selected each time because the first random number generated by ranuni(12345) will be the same very time.  There are two basic approaches to get around this.  One is to switch to ranuni(0) instead of ranuni(12345).  The drawback here is that ranuni(0) selects a different random number each time.  If you are ever asked to prove that your results are correct, you will not be able to replicate them, let alone prove that they are correct.  The second approach involves writing a more complex program (a few approaches are possible).  So if you add some details (do you care if the results can be replicated), you're sure to get some feedback.

Good luck.

FriedEgg
SAS Employee

That is a good thought Astounding.  The use of proc surveyselect does lend somewhat to solving this issue.  The seed, when not defined, will be selected randomly and also printed out.  If you want to repeat a sample you can reuse the seed from the previous run.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3876 views
  • 1 like
  • 4 in conversation