I'm new to sas and I was wondering if it is possible to test a random percent of data. For example, I have a large data set and I want to test between 8% and 20% of the data at random. How would I go about doing something like this? Basically what I have been trying to do is use a do loop with the ranuni command but I can't get my desired specifications. Any help or advice would be much appreciated. Thanks!,
It sounds like you need two random number streams. One determines the percentage of observations, then the second actually selects the observations. For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:
data subset;
if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);
retain percent_to_select;
set population;
if ranuni(13579) < percent_to_select;
run;
Here, percent_to_select is continuous. You could apply a function to it, if you want discrete values.
Good luck.
It sounds like you need two random number streams. One determines the percentage of observations, then the second actually selects the observations. For example, in this formula 0.06 represents the size of the interval from 0.08 to 0.14:
data subset;
if _n_=1 then percent_to_select = 0.08 + 0.06 * ranuni(12345);
retain percent_to_select;
set population;
if ranuni(13579) < percent_to_select;
run;
Here, percent_to_select is continuous. You could apply a function to it, if you want discrete values.
Good luck.
proc surveyselect data=have out=want samprate=0.2 outall;
run;
The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected.
To combine the two previous examples:
%let min=0.08;
%let max=0.20;
%let rsr=%sysfunc(round(%sysevalf( &min + ( &max - &min ) * %sysfunc(ranuni(0))),.01));
proc surveyselect
data=sashelp.retail
out=want
samprate=&rsr;
run;
One more note ... a second thought about generating the percentage to select with ranuni(12345)
If you run the program multiple times, you will get the same percentage selected each time because the first random number generated by ranuni(12345) will be the same very time. There are two basic approaches to get around this. One is to switch to ranuni(0) instead of ranuni(12345). The drawback here is that ranuni(0) selects a different random number each time. If you are ever asked to prove that your results are correct, you will not be able to replicate them, let alone prove that they are correct. The second approach involves writing a more complex program (a few approaches are possible). So if you add some details (do you care if the results can be replicated), you're sure to get some feedback.
Good luck.
That is a good thought Astounding. The use of proc surveyselect does lend somewhat to solving this issue. The seed, when not defined, will be selected randomly and also printed out. If you want to repeat a sample you can reuse the seed from the previous run.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.