BookmarkSubscribeRSS Feed
Santelle
Calcite | Level 5

My sample data is not representative of my population, so I'm trying to draw a random sample according to predefined proportions.  In effect, some groups will have to be over sampled with replacement in order to reach its required proportion, while other groups will have enough observations to sample from. 

 

Here is a simple equivalent example.  

 

data stress;
input ID 1-4 Name $ 6-25 RestHR 27-29 MaxHR 31-33
RecHR 35-37 TimeMin 39-40 TimeSec 42-43
Tolerance $ 45;
datalines;
2458 Murray, W 72 185 128 12 38 D
2462 Almers, C 68 171 133 10 5 I
2501 Bonaventure, T 78 177 139 11 13 I
2523 Johnson, R 69 162 114 9 42 S
2539 LaMance, K 75 168 141 11 46 D
2544 Jones, M 79 187 136 12 26 N
2552 Reberson, P 69 158 139 15 41 D
2555 King, E 70 167 122 13 13 I
2563 Pitts, D 71 159 116 10 22 S
2568 Eberhardt, S 72 182 122 16 49 N
2571 Nunnelly, A 65 181 141 15 2 I
2572 Oberon, M 74 177 138 12 11 D
2574 Peterson, V 80 164 137 14 9 D
2575 Quigley, M 74 152 113 11 26 I
2578 Cameron, L 75 158 108 14 27 I
2579 Underwood, K 72 165 127 13 19 S
2584 Takahashi, Y 76 163 135 16 7 D
2586 Derber, B 68 176 119 17 35 N
2588 Ivan, H 70 182 126 15 41 N
2589 Wilcox, E 78 189 138 14 57 I
;
run;

 

In the end I'd like to have 10% of my data to be for Tolerance = 'D', 20% to be for Tolerance = 'I', and so on.

 

data stress;
set stress;
if Tolerance = 'D' then Proportion = 0.1;
else if Tolerance = 'I' then Proportion = 0.2;
else if Tolerance = 'S' then Proportion = 0.3;
else if Tolerance = 'N' then Proportion = 0.4;
run;
proc freq data = stress;
tables Tolerance;
run;

 

 

Here's what I've tried:

 


proc sort data = stress;by Tolerance;run;
proc surveyselect data=stress
method = urs
seed=12345
sampsize = 1000
out=StressStample1
outhits
;
id _all_;
strata Tolerance;
run;
proc surveyselect data=stress
method=PPS_WR
seed=12345
sampsize=1000
out=StressSample2
outhits
;
id _all_;
strata Tolerance;
size Proportion;
run;
proc freq data = StressStample1;
tables Tolerance;
run;
proc freq data = StressSample2;
tables Tolerance;
run;

 

Thanx in advance.

1 REPLY 1
Santelle
Calcite | Level 5

I ended up going rather 'manually'.  For now it will suffice.

 

data stress;
set stress;
if Tolerance = 'D' then SampSize= 10;
else if Tolerance = 'I' then SampSize= 20;
else if Tolerance = 'S' then SampSize= 30;
else if Tolerance = 'N' then SampSize= 40;
run;
proc sort data = stress;by Tolerance;run;
proc sql;
select distinct SampSize into:SampSize separated by ' ' from Stress;
quit;
proc surveyselect data=stress method=urs n=(&SampSize.) out=StressSample outhits;
strata Tolerance;
run;
proc freq data = StressSample;
tables Tolerance;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 462 views
  • 0 likes
  • 1 in conversation