DATA Step, Macro, Functions and more

How can I extract sample with desired statistics

Reply
Occasional Contributor
Posts: 9

How can I extract sample with desired statistics

I have 1million people's score.

 

Each people's score is between 400 and 650;

 

I want to extract sample(exactly 2858 person's score information) and I also want sample score's average is 564.

 

How can I extract this information??

 

Any helps and tips will be much appreciated.

 

Thanks, Jamie.

Super User
Posts: 9,691

Re: How can I extract sample with desired statistics

Super User
Posts: 10,538

Re: How can I extract sample with desired statistics

How close to 564 must it be? If you require exactly that you may be spending some time. Do you have a desired range on the values? Standard deviation

 

And is this supposed to be anything resembling a random sample?

If not, then how many values do you have in the data that are 564. If the number is > 2858 then just grab them. Likely not actually useful for your purpose but would fit the bare bones of your request.

Or 1429 each of values 563 and 565

Or many other selections would have the desired mean.

 

I would probably start with

 

Proc surveyselect data=have out=want sampsize=2858;

run;

 

Proc mean data=want ;

   var score;

run;

And see if the mean is "close enough".

This is cheap enough in time that you could even re-run the above code until you got something close.

Occasional Contributor
Posts: 9

Re: How can I extract sample with desired statistics

Hi!

 

The 2858 sample score's average does not have to be exactly 564.

I will do sampling many times until I have average 560~570.

Anyway, thanks for your big help!!

 

Contributor
Posts: 56

Re: How can I extract sample with desired statistics

I can suggest that if you use startified sampling, the sampling observations can be read according to sampling weight.

Hopefully this code works for you.

%macro do_sampling;
%do %until (&avg_score ge 560 and &avg_score le 564);
proc surveyselect data=sort_sample
method=srs n=2858
seed=1234 out=sample_customer;
strata score;
run;
proc sql;
select avg(score) into :avg_score from sort_sample;
quit;
%put &avg_score;
%end;
%mend do_sampling;

Ask a Question
Discussion stats
  • 4 replies
  • 141 views
  • 0 likes
  • 4 in conversation