SAS Data Science

Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Viya (Machine Learning), SAS Visual Text Analytics, with point-and-click interfaces or programming
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nate93
Calcite | Level 5

Hi,

 

I have a dataset of size 1909. And from this dataset I would like to create 50 random sample (SRS is fine), but the 50 random sample also need to be of a random size between 30% and 80% of the original sample size - so anything between 573 and 1527 observations in my 50 samples (since dataset is 1909 obs).

 

I have attempted to use proc surveryselect but this will only work in creating 1 sample. The N= option only seems to take a numeric value rather than a function, as I was trying to put a random number generator here and then wrap this in a loop and do 50 times, but I couldn't get anywhere.

 

Anyone any ideas?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try something like this:

 

%let n=1909;
%let nSamples=50;

data test;
call streaminit(89868);
do i = 1 to &n;
    x = rand("normal");
    output;
    end;
drop i;
run;

data sampsize;
do sampId = 1 to &nSamples;
    alpha = rand("uniform");
    sampleSize = round(&n * (alpha * 0.3 + (1-alpha) * 0.8));
    output;
    end;
drop alpha;
run;

data temp;
set test;
do sampId = 1 to &nSamples;
    output;
    end;
run;

proc sort data=temp; by sampId; run;

proc surveyselect data=temp sampsize=sampsize out=mySamples;
strata sampId;
run;
 
PG

View solution in original post

7 REPLIES 7
ballardw
Super User

It might help to provide a more concrete example of what you are attempting to do with a smaller number of starting records and what 2 or 3 groups might look like.

 

It sounds like you might want to use replicate sampling.

The following will select 3 REPLICATES which I think your 50 might represent and selects 40% of the records, SAMPRATE= 0.4, for each replicate.

proc surveyselect data=sashelp.class reps=3
   samprate= 0.4 out=selected ;
run;

The output data set has a variable named Replicate that indicates which of the 3 reps the record(s) belong to.

 

 

If you mean to randomly change the sample rate for each replicate then that's going to be a different kettle of fish.

Reeza
Super User

Assuming you're using the latest version of SAS and SAS/STAT.

 

%macro generate_random(dsn=, n_samples=, n_min=, n_max=);

%do i=1 %to &n_samples;
proc surveyselect data=&dsn reps=1
   sampsize= %sysfunc(rand(integer, &n_min, &n_max)) out=selected_&i. ;
run;

%end;

%mend;

%generate_random(dsn=sashelp.heart, n_samples=20, n_min=5, n_max=100);*combine all results into one data set;data all_selections;set selected_: indsname=source;sample = source;run;

 


@Nate93 wrote:

Hi,

 

I have a dataset of size 1909. And from this dataset I would like to create 50 random sample (SRS is fine), but the 50 random sample also need to be of a random size between 30% and 80% of the original sample size - so anything between 573 and 1527 observations in my 50 samples (since dataset is 1909 obs).

 

I have attempted to use proc surveryselect but this will only work in creating 1 sample. The N= option only seems to take a numeric value rather than a function, as I was trying to put a random number generator here and then wrap this in a loop and do 50 times, but I couldn't get anywhere.

 

Anyone any ideas?

 

Thanks!



 

Nate93
Calcite | Level 5

Thanks @Reeza this is definitely along the lines of what I want to do. But there must be a version issue as unfortunately this doesn't work for me.

Reeza
Super User

@Nate93 wrote:

Thanks @Reeza this is definitely along the lines of what I want to do. But there must be a version issue as unfortunately this doesn't work for me.


It's likely the random number generator portion:

 

 %sysfunc(rand(integer, &n_min, &n_max))

 

If you find a different way to generate the numbers - try rand uniform with some math to get the  intervals instead. rand integer is new to SAS but is a great new option IMO.

PGStats
Opal | Level 21

Try something like this:

 

%let n=1909;
%let nSamples=50;

data test;
call streaminit(89868);
do i = 1 to &n;
    x = rand("normal");
    output;
    end;
drop i;
run;

data sampsize;
do sampId = 1 to &nSamples;
    alpha = rand("uniform");
    sampleSize = round(&n * (alpha * 0.3 + (1-alpha) * 0.8));
    output;
    end;
drop alpha;
run;

data temp;
set test;
do sampId = 1 to &nSamples;
    output;
    end;
run;

proc sort data=temp; by sampId; run;

proc surveyselect data=temp sampsize=sampsize out=mySamples;
strata sampId;
run;
 
PG
Dm95
Calcite | Level 5

Hello,

 

For my dataset, I have created groups, however, I'm wanting to pull a random sample from these groups and then perform a "proc glm" test on the random sample pulled. I don't think I am getting correct results. below is my code that I am attempting. 

 

 

data auditdata; set auditdata;
if lid=1 and restatement_new=1 then group=1;
else if lid=1 and restatement_new=0 then group=2;
else if lid=0 and restatement_new=1 then group=3;
else if lid=0 and restatement_new=0 then group=4;
run;



data auditdata; set auditdata;
ranum=ranuni(1005);
if ranum<.20 then rsample=1; else rsample=0;
run;



proc glm data=auditdata;
class rsample;
model da da2=rsample;
means rsample/hovtest tukey scheffe bon cldiff clm lines alpha=.05;
run;
PGStats
Opal | Level 21

You would have a greater audience if you posted your question as a new topic.

 

If I understand what you are trying to do, you need:

 

proc glm data=auditdata;
where rsample = 1;
class group;
model da da2=group;
means group/hovtest tukey scheffe bon cldiff clm lines alpha=.05;
run;

 

PG

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 6344 views
  • 0 likes
  • 5 in conversation