- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataset of size 1909. And from this dataset I would like to create 50 random sample (SRS is fine), but the 50 random sample also need to be of a random size between 30% and 80% of the original sample size - so anything between 573 and 1527 observations in my 50 samples (since dataset is 1909 obs).
I have attempted to use proc surveryselect but this will only work in creating 1 sample. The N= option only seems to take a numeric value rather than a function, as I was trying to put a random number generator here and then wrap this in a loop and do 50 times, but I couldn't get anywhere.
Anyone any ideas?
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try something like this:
%let n=1909;
%let nSamples=50;
data test;
call streaminit(89868);
do i = 1 to &n;
x = rand("normal");
output;
end;
drop i;
run;
data sampsize;
do sampId = 1 to &nSamples;
alpha = rand("uniform");
sampleSize = round(&n * (alpha * 0.3 + (1-alpha) * 0.8));
output;
end;
drop alpha;
run;
data temp;
set test;
do sampId = 1 to &nSamples;
output;
end;
run;
proc sort data=temp; by sampId; run;
proc surveyselect data=temp sampsize=sampsize out=mySamples;
strata sampId;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It might help to provide a more concrete example of what you are attempting to do with a smaller number of starting records and what 2 or 3 groups might look like.
It sounds like you might want to use replicate sampling.
The following will select 3 REPLICATES which I think your 50 might represent and selects 40% of the records, SAMPRATE= 0.4, for each replicate.
proc surveyselect data=sashelp.class reps=3 samprate= 0.4 out=selected ; run;
The output data set has a variable named Replicate that indicates which of the 3 reps the record(s) belong to.
If you mean to randomly change the sample rate for each replicate then that's going to be a different kettle of fish.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming you're using the latest version of SAS and SAS/STAT.
%macro generate_random(dsn=, n_samples=, n_min=, n_max=);
%do i=1 %to &n_samples;
proc surveyselect data=&dsn reps=1
sampsize= %sysfunc(rand(integer, &n_min, &n_max)) out=selected_&i. ;
run;
%end;
%mend;
%generate_random(dsn=sashelp.heart, n_samples=20, n_min=5, n_max=100);*combine all results into one data set;data all_selections;set selected_: indsname=source;sample = source;run;
@Nate93 wrote:
Hi,
I have a dataset of size 1909. And from this dataset I would like to create 50 random sample (SRS is fine), but the 50 random sample also need to be of a random size between 30% and 80% of the original sample size - so anything between 573 and 1527 observations in my 50 samples (since dataset is 1909 obs).
I have attempted to use proc surveryselect but this will only work in creating 1 sample. The N= option only seems to take a numeric value rather than a function, as I was trying to put a random number generator here and then wrap this in a loop and do 50 times, but I couldn't get anywhere.
Anyone any ideas?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Reeza this is definitely along the lines of what I want to do. But there must be a version issue as unfortunately this doesn't work for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Nate93 wrote:
Thanks @Reeza this is definitely along the lines of what I want to do. But there must be a version issue as unfortunately this doesn't work for me.
It's likely the random number generator portion:
%sysfunc(rand(integer, &n_min, &n_max))
If you find a different way to generate the numbers - try rand uniform with some math to get the intervals instead. rand integer is new to SAS but is a great new option IMO.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try something like this:
%let n=1909;
%let nSamples=50;
data test;
call streaminit(89868);
do i = 1 to &n;
x = rand("normal");
output;
end;
drop i;
run;
data sampsize;
do sampId = 1 to &nSamples;
alpha = rand("uniform");
sampleSize = round(&n * (alpha * 0.3 + (1-alpha) * 0.8));
output;
end;
drop alpha;
run;
data temp;
set test;
do sampId = 1 to &nSamples;
output;
end;
run;
proc sort data=temp; by sampId; run;
proc surveyselect data=temp sampsize=sampsize out=mySamples;
strata sampId;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
For my dataset, I have created groups, however, I'm wanting to pull a random sample from these groups and then perform a "proc glm" test on the random sample pulled. I don't think I am getting correct results. below is my code that I am attempting.
data auditdata; set auditdata;
if lid=1 and restatement_new=1 then group=1;
else if lid=1 and restatement_new=0 then group=2;
else if lid=0 and restatement_new=1 then group=3;
else if lid=0 and restatement_new=0 then group=4;
run;
data auditdata; set auditdata;
ranum=ranuni(1005);
if ranum<.20 then rsample=1; else rsample=0;
run;
proc glm data=auditdata;
class rsample;
model da da2=rsample;
means rsample/hovtest tukey scheffe bon cldiff clm lines alpha=.05;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You would have a greater audience if you posted your question as a new topic.
If I understand what you are trying to do, you need:
proc glm data=auditdata;
where rsample = 1;
class group;
model da da2=group;
means group/hovtest tukey scheffe bon cldiff clm lines alpha=.05;
run;