BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jamie0111
Calcite | Level 5

Hi, there.

 

I have large data containing 20,000 people's total test score.

 

ex)             Total Score

member1     727

member2     566

member3     661

member4     525

member5     609

                    .......

 

I want to extract exactly 2,818 people's information.

And I also want to 2,818 people's average total score is between 560~565.

 

Any help and tips will be much appreciated !

Thanks, Jamie.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could balance your random sampling like this:

 

/* Generate some fake data */
data examplePop;
do member = 1 to 20000;
    score = round(1000*rand("uniform"));
    output;
    end;
run;

%let mid=562.5;
%let size=2818;

/* Split the data into two subsets, lower and higher than the target value.
 Add random value for next step */
data lower higher;
set examplePop;
rnd = rand("uniform");
if score < &mid then output lower;
else output higher;
run;

/* Put both subsets into a random order */
proc sort data=lower out=lower(drop=rnd); by rnd; run;
proc sort data=higher out=higher(drop=rnd); by rnd; run;

/* Pick values from the subset that is on the opposite side from the target. If
 the current mean is lower than the target, pick a value at random from the 
 the higher subset, and vice-versa. */
data want;
if sum < &mid * n then set higher;
else set lower;
output;
n + 1;
sum + score;
if n = &size then stop;
drop n sum;
run;

/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std 
    from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std 
    from want;
quit;

EDIT: Added comments. 

PG

View solution in original post

7 REPLIES 7
Reeza
Super User

That's not the definition of a random sample 🙂

 

I think one of the quickest ways is to randomly sample, calculate the average, ensure it meets the criteria and then stop or repeat.

Or limit the range of scores allowed. Is it ok for one person to have a score of 825 and another 300 as long as the average is between 560-565? That's a tight range as well....

 

EDIT: PROC SURVEYSELECT is typically the method to implement a random sample in SAS. 

 

jamie0111
Calcite | Level 5

I think score between 560-565 sampling will be much better.

 

Then, how can I limit the range of score of sampling...?

 

Your help will be much appreciated!

 

Thanks, Jamie.

abc_xyz
Calcite | Level 5

I was also facing similar kind of issue(even though not exactly same) . Do you have anything in mind as such if is is okay for me to have one person with a score of 825 and another 300 as long as the average is between 560-565. Since same thing only will happen with me in most of the cases.

PGStats
Opal | Level 21

You could balance your random sampling like this:

 

/* Generate some fake data */
data examplePop;
do member = 1 to 20000;
    score = round(1000*rand("uniform"));
    output;
    end;
run;

%let mid=562.5;
%let size=2818;

/* Split the data into two subsets, lower and higher than the target value.
 Add random value for next step */
data lower higher;
set examplePop;
rnd = rand("uniform");
if score < &mid then output lower;
else output higher;
run;

/* Put both subsets into a random order */
proc sort data=lower out=lower(drop=rnd); by rnd; run;
proc sort data=higher out=higher(drop=rnd); by rnd; run;

/* Pick values from the subset that is on the opposite side from the target. If
 the current mean is lower than the target, pick a value at random from the 
 the higher subset, and vice-versa. */
data want;
if sum < &mid * n then set higher;
else set lower;
output;
n + 1;
sum + score;
if n = &size then stop;
drop n sum;
run;

/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std 
    from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std 
    from want;
quit;

EDIT: Added comments. 

PG
Rick_SAS
SAS Super FREQ

PGStats presents an interesting algorithm, and I give him credit for originality. But let's be clear: the sampling scheme is a heuristic method that will often give good results, but is not guaranteed. At the very last step the algorithm might select an extreme outlier that renders the average value out of range.  Still, it probably works for many real-life measurements, especially if the values are distributed symmetrically about the mean.

Ksharp
Super User

Better post it at IML forum, it is about data simulation. @Rick_SAS is there .

Ksharp
Super User

Mine looks like PG's code.

But neither mine nor PG's code can guarantee you to get what you want.

It is highly depended on your data.

 

data examplePop;
do member = 1 to 20000;
    score = round(1000*rand("uniform"));
    output;
    end;
run;

proc sort data=examplepop;
 by score;
run;

data want;
 set examplepop;
 array m{0:2817} _temporary_;
 array s{0:2817} _temporary_;
 i=mod(_n_,2818);
 m{i}=member;
 s{i}=score;
 if 560<=mean(of s{*})<=565 and _n_ gt 2817 then do;
  do j=0 to 2817;
   member=m{j};
   score=s{j};
   output;
  end;
  stop;
 end;
 drop i j ;
run;



/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std 
    from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std 
    from want;
quit;

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2160 views
  • 3 likes
  • 6 in conversation