BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nate93
Calcite | Level 5

Hi,

 

I have a dataset of size 1909. And from this dataset I would like to create 50 random sample (SRS is fine), but the 50 random sample also need to be of a random size between 30% and 80% of the original sample size - so anything between 573 and 1527 observations in my 50 samples (since dataset is 1909 obs).

 

I have attempted to use proc surveryselect but this will only work in creating 1 sample. The N= option only seems to take a numeric value rather than a function, as I was trying to put a random number generator here and then wrap this in a loop and do 50 times, but I couldn't get anywhere.

 

Anyone any ideas?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try something like this:

 

%let n=1909;
%let nSamples=50;

data test;
call streaminit(89868);
do i = 1 to &n;
    x = rand("normal");
    output;
    end;
drop i;
run;

data sampsize;
do sampId = 1 to &nSamples;
    alpha = rand("uniform");
    sampleSize = round(&n * (alpha * 0.3 + (1-alpha) * 0.8));
    output;
    end;
drop alpha;
run;

data temp;
set test;
do sampId = 1 to &nSamples;
    output;
    end;
run;

proc sort data=temp; by sampId; run;

proc surveyselect data=temp sampsize=sampsize out=mySamples;
strata sampId;
run;
 
PG

View solution in original post

7 REPLIES 7
ballardw
Super User

It might help to provide a more concrete example of what you are attempting to do with a smaller number of starting records and what 2 or 3 groups might look like.

 

It sounds like you might want to use replicate sampling.

The following will select 3 REPLICATES which I think your 50 might represent and selects 40% of the records, SAMPRATE= 0.4, for each replicate.

proc surveyselect data=sashelp.class reps=3
   samprate= 0.4 out=selected ;
run;

The output data set has a variable named Replicate that indicates which of the 3 reps the record(s) belong to.

 

 

If you mean to randomly change the sample rate for each replicate then that's going to be a different kettle of fish.

Reeza
Super User

Assuming you're using the latest version of SAS and SAS/STAT.

 

%macro generate_random(dsn=, n_samples=, n_min=, n_max=);

%do i=1 %to &n_samples;
proc surveyselect data=&dsn reps=1
   sampsize= %sysfunc(rand(integer, &n_min, &n_max)) out=selected_&i. ;
run;

%end;

%mend;

%generate_random(dsn=sashelp.heart, n_samples=20, n_min=5, n_max=100);

*combine all results into one data set;
data all_selections;
set selected_: indsname=source;
sample = source;
run;

 


@Nate93 wrote:

Hi,

 

I have a dataset of size 1909. And from this dataset I would like to create 50 random sample (SRS is fine), but the 50 random sample also need to be of a random size between 30% and 80% of the original sample size - so anything between 573 and 1527 observations in my 50 samples (since dataset is 1909 obs).

 

I have attempted to use proc surveryselect but this will only work in creating 1 sample. The N= option only seems to take a numeric value rather than a function, as I was trying to put a random number generator here and then wrap this in a loop and do 50 times, but I couldn't get anywhere.

 

Anyone any ideas?

 

Thanks!



 

Nate93
Calcite | Level 5

Thanks @Reeza this is definitely along the lines of what I want to do. But there must be a version issue as unfortunately this doesn't work for me.

Reeza
Super User

@Nate93 wrote:

Thanks @Reeza this is definitely along the lines of what I want to do. But there must be a version issue as unfortunately this doesn't work for me.


It's likely the random number generator portion:

 

 %sysfunc(rand(integer, &n_min, &n_max))

 

If you find a different way to generate the numbers - try rand uniform with some math to get the  intervals instead. rand integer is new to SAS but is a great new option IMO.

PGStats
Opal | Level 21

Try something like this:

 

%let n=1909;
%let nSamples=50;

data test;
call streaminit(89868);
do i = 1 to &n;
    x = rand("normal");
    output;
    end;
drop i;
run;

data sampsize;
do sampId = 1 to &nSamples;
    alpha = rand("uniform");
    sampleSize = round(&n * (alpha * 0.3 + (1-alpha) * 0.8));
    output;
    end;
drop alpha;
run;

data temp;
set test;
do sampId = 1 to &nSamples;
    output;
    end;
run;

proc sort data=temp; by sampId; run;

proc surveyselect data=temp sampsize=sampsize out=mySamples;
strata sampId;
run;
 
PG
Dm95
Calcite | Level 5

Hello,

 

For my dataset, I have created groups, however, I'm wanting to pull a random sample from these groups and then perform a "proc glm" test on the random sample pulled. I don't think I am getting correct results. below is my code that I am attempting. 

 

 

data auditdata; set auditdata;
if lid=1 and restatement_new=1 then group=1;
else if lid=1 and restatement_new=0 then group=2;
else if lid=0 and restatement_new=1 then group=3;
else if lid=0 and restatement_new=0 then group=4;
run;



data auditdata; set auditdata;
ranum=ranuni(1005);
if ranum<.20 then rsample=1; else rsample=0;
run;



proc glm data=auditdata;
class rsample;
model da da2=rsample;
means rsample/hovtest tukey scheffe bon cldiff clm lines alpha=.05;
run;
PGStats
Opal | Level 21

You would have a greater audience if you posted your question as a new topic.

 

If I understand what you are trying to do, you need:

 

proc glm data=auditdata;
where rsample = 1;
class group;
model da da2=group;
means group/hovtest tukey scheffe bon cldiff clm lines alpha=.05;
run;

 

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 5320 views
  • 0 likes
  • 5 in conversation