## How to random sample with desired aggregate statistics

Solved
Occasional Contributor
Posts: 9

# How to random sample with desired aggregate statistics

[ Edited ]

Hi, there.

I have large data containing 20,000 people's total test score.

ex)             Total Score

member1     727

member2     566

member3     661

member4     525

member5     609

.......

I want to extract exactly 2,818 people's information.

And I also want to 2,818 people's average total score is between 560~565.

Any help and tips will be much appreciated !

Thanks, Jamie.

Accepted Solutions
Solution
‎03-24-2017 03:59 AM
Posts: 5,523

## Re: How to random sample with desired aggregate statistics

[ Edited ]

You could balance your random sampling like this:

``````/* Generate some fake data */
data examplePop;
do member = 1 to 20000;
score = round(1000*rand("uniform"));
output;
end;
run;

%let mid=562.5;
%let size=2818;

/* Split the data into two subsets, lower and higher than the target value.
Add random value for next step */
data lower higher;
set examplePop;
rnd = rand("uniform");
if score < &mid then output lower;
else output higher;
run;

/* Put both subsets into a random order */
proc sort data=lower out=lower(drop=rnd); by rnd; run;
proc sort data=higher out=higher(drop=rnd); by rnd; run;

/* Pick values from the subset that is on the opposite side from the target. If
the current mean is lower than the target, pick a value at random from the
the higher subset, and vice-versa. */
data want;
if sum < &mid * n then set higher;
else set lower;
output;
n + 1;
sum + score;
if n = &size then stop;
drop n sum;
run;

/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std
from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std
from want;
quit;
``````

PG

All Replies
Super User
Posts: 23,683

## Re: How to random sample with desired aggregate statistics

[ Edited ]

That's not the definition of a random sample

I think one of the quickest ways is to randomly sample, calculate the average, ensure it meets the criteria and then stop or repeat.

Or limit the range of scores allowed. Is it ok for one person to have a score of 825 and another 300 as long as the average is between 560-565? That's a tight range as well....

EDIT: PROC SURVEYSELECT is typically the method to implement a random sample in SAS.

Occasional Contributor
Posts: 9

## Re: How to random sample with desired aggregate statistics

I think score between 560-565 sampling will be much better.

Then, how can I limit the range of score of sampling...?

Your help will be much appreciated!

Thanks, Jamie.

Solution
‎03-24-2017 03:59 AM
Posts: 5,523

## Re: How to random sample with desired aggregate statistics

[ Edited ]

You could balance your random sampling like this:

``````/* Generate some fake data */
data examplePop;
do member = 1 to 20000;
score = round(1000*rand("uniform"));
output;
end;
run;

%let mid=562.5;
%let size=2818;

/* Split the data into two subsets, lower and higher than the target value.
Add random value for next step */
data lower higher;
set examplePop;
rnd = rand("uniform");
if score < &mid then output lower;
else output higher;
run;

/* Put both subsets into a random order */
proc sort data=lower out=lower(drop=rnd); by rnd; run;
proc sort data=higher out=higher(drop=rnd); by rnd; run;

/* Pick values from the subset that is on the opposite side from the target. If
the current mean is lower than the target, pick a value at random from the
the higher subset, and vice-versa. */
data want;
if sum < &mid * n then set higher;
else set lower;
output;
n + 1;
sum + score;
if n = &size then stop;
drop n sum;
run;

/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std
from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std
from want;
quit;
``````

PG
SAS Super FREQ
Posts: 4,240

## Re: How to random sample with desired aggregate statistics

PGStats presents an interesting algorithm, and I give him credit for originality. But let's be clear: the sampling scheme is a heuristic method that will often give good results, but is not guaranteed. At the very last step the algorithm might select an extreme outlier that renders the average value out of range.  Still, it probably works for many real-life measurements, especially if the values are distributed symmetrically about the mean.

Super User
Posts: 10,770

## Re: How to random sample with desired aggregate statistics

Better post it at IML forum, it is about data simulation. @Rick_SAS is there .

Super User
Posts: 10,770

## Re: How to random sample with desired aggregate statistics

[ Edited ]

Mine looks like PG's code.

But neither mine nor PG's code can guarantee you to get what you want.

It is highly depended on your data.

``````data examplePop;
do member = 1 to 20000;
score = round(1000*rand("uniform"));
output;
end;
run;

proc sort data=examplepop;
by score;
run;

data want;
set examplepop;
array m{0:2817} _temporary_;
array s{0:2817} _temporary_;
i=mod(_n_,2818);
m{i}=member;
s{i}=score;
if 560<=mean(of s{*})<=565 and _n_ gt 2817 then do;
do j=0 to 2817;
member=m{j};
score=s{j};
output;
end;
stop;
end;
drop i j ;
run;

/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std
from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std
from want;
quit;``````

☑ This topic is solved.